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. .1. . FIELD OF INVENTION 

... The present invention is -in ; the field, of ..molecular 

biology and computer science; more particularly, the 

present ; invent ion.. describes methods of analyzing gene- . 

transcripts. and diagnosing. the genetic expression of cells 
and tissue. 

2. BACKGROUND OF THE INVENT Tnu ... 

.. Until very recently, the. history of molecular, biology 
has been written one, gene. -at a time. ...Scientists have 
observed- the cell's physical changes,, isolated! mixtures 
from the cell or its milieu, purified, proteins, sequenced 
proteins and therefrom constructed probes to look for the 
corresponding gene ... 

Recently, different nations have set up massive 
projects to sequence the billions of bases in the human, 
genome. These projects typically begin with dividing the 
genome into .large portions of chromosomes and then 
determining the sequences of these pieces, which are then 
analyzed for identity with known proteins or portions 
thereof, known as motifs. Unfortunately, the majority of 
genomic DNA does not encode proteins and. though it ..is 
postulated, to have some effect on the cell's ability to 
make protein, its relevance to medical applications is not 
25 understood at this time. 

A third methodology involves sequencing only the 
transcripts encoding the cellular machinery actively 
involved in making protein, namely the mRNA. The advantage 
is that the cell has already edited out all the non-coding 
30 DNA, and it is relatively easy to identify the protein- 
coding portion of the RNA. The utility of this approach 
was not immediately obvious to genomic researchers, in 
• fact, when cDNA sequencing was initially proposed, the 
method was roundly denounced by those committed to genomic 
35 sequencing. For example, the head of the U.S. Human Genome 
project discounted CDNA sequencing as not valuable and 
refused to approve funding of projects. 

In this disclosure, we teach methods for analyzing 
DNA, including cDNA libraries. Based on our analyses and 
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research, we see each individual gene product as a "pixel" 
of information, which relates to the expression of that, 
and only that, gene-. He teach herein, methods whereby the 
individual "pixels" of gene expression information can be 
5 combined into a single gene transcript "image," in which 
each of the individual genes can be visualized 
simultaneously .and allowing relationships between the gene 
pixels to be easily visualized and understood.. . > 

We further teach a new method which we call electronic 
10. subtraction. Electronic subtraction will enable the gene 
researcher to. turn a single image into a moving picture, . 
one which describes the temporality or dynamics of gene 
expression, at the level of a cell or a whole tissue, it 
is that sense of "motion" of cellular machinery on the 
15 scale of a cell or organ which constitutes the new 

invention herein. This -constitutes a new view into the 
process of living cell physiology and one which holds great 
promise to unveil and discover new therapeutic and 
diagnostic approaches in medicine. 
20 We teach another method which we call "electronic 

northern," which tracks the expression of a single gene 
across many types of cells and tissues.^ 

Nucleic acids (DNA and RNA) carry within their 
sequence the hereditary information and are therefore the 
25 prime molecules of life. Nucleic acids are found in all 

living organisms including bacteria, fungi, viruses, plants 
and animals, it is of interest to determine the relative 
abundance of different discrete nucleic acids in different 
cells, tissues and organisms over time under various 
30 conditions, treatments and regimes. 

All dividing cells in the human body contain the same 
set of 23 pairs of chromosomes, it is estimated that these 
autosomal and sex chromosomes encode approximately 100,000 
genes. The differences among different types of cells are 
35 believed to reflect the differential expression of the 
100,000 or so genes. Fundamental questions of biology 
could be answered by understanding which genes are 
transcribed and knowing the relative abundance of 
transcripts in different cells. 
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Previously, the art has only provided for the analysis 
of a few known genes at a time by standard molecular 
biology techniques such as PCR, northern blot analysis', or 
other types of DNA probe analysis such as' in situ 
5 hybridization. Each of these "methods allows one to analyze 
the transcription of "only known genes' and/or small numbers 
of genes at a time. "Nucl. Acids Res. 19, 7097-7104 (1991); 
Nucl. Acids -Res'. 18, 4833-42 (1990) ; Nucl. Acids Res. is, 
2789-92 (1989); European' j; Neuroscience 2, 1063-1073 • 
10 (1990); Analytical Biochem. 187, 364-73 (1990); Genet. 
Annals Techn. Appl. 7, 64-70 (1*990) ; GATA 8(4) , 129-33 
(1991); Proc. Nati: Acad. Sci . USA 85, 1696-1700 (1988); 
Nucl. Acids Res. 19, 1954 (1991); Proc. Natl. Acad. Sci' 
USA S3., 1943-47 (1991); Nucl. Acids Res. l£, 6123-27 
15 (1991); Proc. Natl. Acad; Sci. USA 15, 5738-42 ' (1988) ; 
NUCl. Acids Res. 16, 10937 (1988). 

Studies of the number and types of genes whose 
transcription is induced or otherwise regulated during cell 
processes such as activation, differentiation, aging, viral 
20 transformation, morphogenesis, and mitosis have been 

pursued for many years, using a variety of methodologies 
One of the earliest methods was to isolate and analyze 
levels of the proteins in a cell, tissue, organ system, or 
even organisms both before and after the process of 
25 interest. One method of analyzing multiple proteins in a 
sample is using 2-dimensional gel electrophoresis, wherein 
proteins can be, in principle, identified and quantified as 
individual bands, and ultimately reduced to a discrete 
signal. At present, 2-dimensional analysis only resolves 
approximately 15% of the proteins, m order to positively 
analyze those bands which are resolved, each band must be 
excised from the membrane and subjected to protein sequence 
analysis using Edman degradation. Unfortunately, most of 
the bands were present in quantities too small to obtain a 
reliable sequence, and many of those bands contained more 
than one discrete protein. An additional difficulty is 
that many of the proteins were blocked at the 
amino-terminus, further complicating the sequencing 
process . 
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Analyzing differentiation at the gene transcription 
level has overcome many of these disadvantages' and 
drawbacks, since the power of recombinant" DNA technology 
allows amplification of" signals containing "very smail 
5 amounts of material. The most common method, called 
"hybridization subtraction, " involves isolation of mRNA 
from the biological specimen before (B) and after (A) the 
developmental process of interest, trahscribing one set of 
mRNA into cDNA, subtracting specimen b' from' specimen A 
10' (mRNA from cDNA) by hybridization, and constructing a cDNA 
library from the non-hybridizing mRNA fraction. Many 
different groups have 'used this strategy successfully, and 
a variety of procedures have been published and improved 
upon using this same basic scheme. Nucl. Acids Res. 19, 
15 7097-7104 (1991); Niicl. Acids Res! 18, 4833-4.2, (1990); ' 
• Nucl. Acids Res. 18, 2789-92 (1989) ; European J. 
Neuroscience 2, 1063-1073 (1990) ; Analytical Biochem. Ifi7, 
364-73 (1990); Genet. Annals Techn. Appl. 2, 64-70 (1990); 
GATAK4), 129-33 (1991); Proc. Natl. Acad. Sci. USA 85, 
20 1696-1700 (1988); Nucl. Acids Res. 19, 1954 (1991); Proc. 
Natl. Acad. Sci. USA 88. 1943-47 (1991); Nucl. Acids Res! 
12, 6123-27 (1991); Proc. Natl. Acad. Sci. USA 85., 5738-42 
(i988); Nucl. Acids Res. 1£, 10937 (1988). 

Although each of these techniques have particular 
25 strengths and weaknesses, there are still some limitations 
and undesirable aspects of these methods: First, the time 
and effort required to construct such libraries is quite 
large. Typically, a trained molecular biologist might 
expect construction and characterization of such a library 
30 to require 3 to 6 months, depending on the level of skill, 
experience, and luck. Second, the resulting subtraction 
libraries are typically inferior to the libraries 
constructed by standard methodology, a typical 
conventional cDNA library should have a clone complexity of 
35 at least 10* clones, and an average insert size of 1-3 kB. 
In contrast, subtracted libraries can have complexities of 
10 2 or 10 3 and average insert sizes of 0.2 kB. Therefore, 
there can be a significant loss of clone and sequence 
information associated with such libraries. Third, this 
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approach allows the researcher to capture only the genes 
induced in specimen A relative to specimen B, not 
vice-versa, nor does it easily allow comparison to a third 
specimen of. interest (C).. Four th,. this., approach requires 
5 very large amounts (hundreds of micrograms) of "driver" 
mRNA (specimen B) , which significantly limits the number, 
and type. of subtractions .that are possible since many 
tissues and. cells are. very difficult to ob.tain in large 
quantities.- , 

10 Fifth, the resolution, of the subtraction is dependent 

upon the physical properties of- DNA:. DNA or..RNA:DNA 
hybridization. The ability of a .given, sequence to find a 
hybridization match is dependent on its. unique CoT. value. 
The CoT value is a function of the number, of copies 
(concentration) of the particular sequence., multiplied by 
the time of hybridization, it follows that for . sequences 
which are abundant, hybridization events will occur very 
rapidly (low CoT value), while rare sequences will form 
duplexes at very high CoT values. CoT values which allow 
such rare sequences to form duplexes and therefore be 
effectively selected are difficult to achieve in a 
convenient time frame.. Therefore, hybridization 
subtraction is simply not a useful technique, with which to 
study relative levels of rare mRNA species . Sixth, this 
problem is further complicated by the fact that duplex 
formation is also dependent on the nucleotide base 
composition for a given sequence. Those sequences rich in 
G + C form stronger duplexes than those with high contents 
of A + T. Therefore, the former sequences will tend to be 
30 removed selectively by hybridization subtraction. Seventh, 
it is possible that hybridization between nonexact matches' 
can occur. When this happens, the expression of a 
homologous gene may "mask" expression of a gene of 
interest, artificially skewing the results for that 
35 particular gene. 

Matsubara and Okubo proposed using partial cDNA 
sequences to establish expression profiles of genes which 
could be used in functional analyses of the human genome. 
Matsubara and Okubo warned against using random priming, as 
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it creates multiple unique DNA fragments from individual- 
mRNAs and may thus skew the analysis of the number of 
particular mRNAs per library. They sequenced randomly 
selected -members from a 3' -directed cDNA library and 
5 established the frequency of appearance of the various 
ESTs. They proposed comparing lists of ESTs from various 
cell types to classify genes; Genes expressed in many 
different cell types were labeled housekeepers and those : ' 
selectively expressed in certain cells were labeled cell- 
10 specific genes, even in the absence of the full sequence of 
the gene or the biological' activity of -the gene' product. 

The present invention' avoids the' drawbacks of the 
prior art by providing a method to quantify the relative 
abundance -of multiple gene transcripts in a given 
15 biological specimen by the use of high-throughput 

sequence-specific analysis of individual RNAs and/or their 
corresponding cDNAs. 

The present invention offers several advantages over 
current protein discovery methods which attempt to isolate 
20 individual proteins based upon biological effects. The 
method of the instant invention provides for detailed 
diagnostic comparisons of cell profiles revealing numerous 
changes in the expression of individual transcripts. 

The instant invention provides several advantages over 
current subtraction methods including a more complex 
library analysis (io 6 to io 7 clones as compared to 10» 
clones) which allows identification of low abundance 
messages as well as enabling the identification of messages 
which either increase or decrease in abundance. These 
large libraries are very routine to make in contrast to the 
libraries of previous methods, m addition, homologues can 
easily be distinguished with the method of the instant 
invention. 

This method is very convenient because it organizes a 
large quantity of data into a comprehensible, digestible 
format. The most significant differences are highlighted 
by electronic subtraction. In depth analyses are made more 
convenient. 
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The present invention provides several advantages over 
previous methods of electronic analysis of cDNA. -The 
metJaod : is particularly powerful when. more than ,100 and 
preferably more- than 1,000 .gene transcripts are analyzed. 
5 In such a case, new low-frequency transcripts are 
discovered and tissue typed. 

High resolution analysis of gene expression can be 
used directly as a diagnostic profile or to identify 
disease-specific genes for the development of more classic 
10 diagnostic approaches. - 

This process is defined as gene transcript frequency 
analysis. The resulting .quantitative analysis of je the gene 
transcripts is defined as comparative gene transcr^i<pt 
analysis. 

15 .3. SUMMARY OF THE INVENTION 

The invention is a method of analyzing a specimen 
containing gene transcripts comprising the steps of (a) 
producing a library of biological sequences; (b) generating 
a set of transcript sequences, where each of the transcript 

20 sequences in said set is indicative of a different one of 
the biological sequences of the library; (c) processing the 
transcript sequences in a programmed computer (in which a 
database of reference transcript sequences indicative of 
reference sequences is stored) , to generate an identified 

25 sequence value for each of the transcript sequences, where 
each said identified sequence value is indicative of 
sequence annotation and a degree of match between one of 
the biological sequences of the library and at least one of 
the reference sequences; and (d) processing each said 

30 identified sequence value to generate final data values 

indicative of the number of times each identified sequence 
value is present in the library. 

The invention also includes a method of comparing two 
specimens containing gene transcripts. The first specimen 

35 is processed as described above. The second specimen is 
used to produce a second library of biological sequences, 
which is used to generate a second set of transcript 
sequences, where each of the transcript sequences in the 
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second set is indicative of one of the biological sequences 
of the second library * Then the second set of transcript 
sequences is processed ■ in a programmed computer to generate 
a second set of identified sequence values, namely the 
5 further identified sequence values, each of which is 

indicative of a sequence annotation and includes a degree 
of match between one of the biological sequences of the 
second library and at least one of the reference sequences. 
The further identified sequence values are processed to 
10 generate further :final data values indicative of the number 
of times each further identified sequence value is present 
in the second library. The final data values from the : . 
first specimen and the further-identified sequence values 
from the second specimen are -processed to generate ratios : 
15 of transcript sequences, which indicate the differences in 
the number of gene transcripts between the two specimens. 

In a further embodiment, the method includes 
quantifying the relative abundance of mRNA in a biological 
specimen by (a) isolating a population of mRNA transcripts 
20 from a biological specimen; (b) identifying genes from 
which the mRNA was transcribed by a sequence-specific 
method; (c) determining the numbers of mRNA transcripts 
corresponding to each of the genes; and (d) using the mRNA ' 
transcript numbers to determine the relative abundance of 
25 mRNA transcripts within the population of mRNA transcripts. 
Also disclosed is a method of producing. a gene 
transcript image analysis by first obtaining a mixture of 
mRNA, from which cDNA copies are made. The cDNA is 
inserted into a suitable vector which is used to transfect 
30 suitable host strain cells which are plated out and 

permitted to grow into clones, each cone representing a 
unique mRNA. A representative population of clones 
transfected with cDNA is isolated. Each clone in the 
population is identified by a sequence-specific method 
35 which identifies the gene from which the unique mRNA was 
transcribed. The number of times each gene is identified 
to a clone is determined to evaluate gene transcript 
abundance. The genes and their abundances are listed in 
order of abundance to produce a gene transcript image. 
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In a further embodiment, the relative abundance of the 
gene transcripts in one cell type or tissue is compared 
with the relative abundance of gene transcript numbers in a 
second cell type or tissue in order to. identify the 
5 differences and similarities. 

In a further embodiment, the method includes a system 
for analyzing a library of biological sequences including a 
means for receiving a set of. transcript sequences, where 
each of -the transcript sequences is indicative of a 

10 different one of. the biological sequences of the library; 
and a means for processing the transcript sequences in a 
computer system in which a database of reference transcript 
sequences indicative of. reference sequences is stored, 
wherein the computer is programmed with software for 

15 generating an identified sequence value for each of the 
transcript sequences, where each said identified sequence 
value is indicative of a sequence annotation and the degree 
of match between a different one of the biological 
sequences of the library and at least one of the reference 

20 sequences, and for processing each said identified sequence 
value to generate final data values indicative of the 
number of times each identified sequence value is present 
in the library. 

In, essence, the invention, is a method and system for 

25 quantifying the relative abundance of gene transcripts in a 
biological specimen. The invention provides a method for 
comparing the gene. transcript image from two or more 
different biological specimens in order to distinguish 
between the two specimens and identify one or more genes 

30 which are differentially expressed between the two 
specimens. Thus, this gene transcript image and its 
comparison can be used as a diagnostic. One embodiment of 
the method generates high-throughput sequence-specific 
analysis of multiple RNAs or their corresponding cDNAs: a 

35 gene transcript image. Another embodiment of the method 

produces the gene transcript imaging analysis by the use of 
high-throughput cDNA sequence analysis. In addition, two 
or more gene transcript images can be compared and used to 
detect or diagnose a particular biological state, disease, 
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or condition which is correlated. to the relative abundance 
of , gene transcripts in a given cell or population of cells. 

• * . * - . * 

' * * * ' . i • ..." t 

• . < « . DESCRIPTION O P. THE TABLES AMD DRAWINSS 

4.1.. TABLES 

5 Tabfre 1 presents a detailed explanation of .the letter 

codes .utilized in Tables 2-5. 

T _ able 2 lists , the one hundred most common . gene 
transcripts, it is a partial list of isolates .from the 
HUVEC cDNA library prepared and sequenced as described 
10 below. The left-hand column refers to the sequence's order 
of abundance in this table.. The next column labeled 
"number" is the clone number of the first HUVEC sequence 
identification reference matching the sequence in the 
"entry" column number. .Isolates that have not. been 
15 sequenced are not present in Table 2. The next column, 

labeled "N«, indicates the total number of cDNAs which have 
the same degree of match with the sequence of the reference 
transcript in the "entry" column. 

The column labeled "entry" gives the NIH GENBANK locus 
name, which corresponds to the library sequence numbers. 
The «s" column indicates in a few cases the species of the 
reference sequence. The code for column "s" is given in 
Table 1. The column labeled "descriptor" provides a plain 
English explanation of the identity of the sequence 
corresponding to the NIH GENBANK locus name in the "entry- 
column. 

Tab * e ? is a comparison of the top fifteen most 
abundant gene transcripts in normal monocytes and activated 
macrophage cells. 

Table * is a detailed summary of library subtraction 
analysis summary comparing the THP-l and human macrophage 
cDNA sequences. In Table 4, the same code as in Table 2 is 
used. Additional columns are for "bgfreq" (abundance 
number in the subtractant library) , "rf end" (abundance 
number in the target library) and "ratio" (the target 
abundance number divided by the subtractant abundance 
number) . As is clear from perusal of the table, when the 
abundance number in the subtractant library is "0", the 
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target abundance number is divided by 0.05. This is a way 
of obtaining a result (not .possible dividing by 0) and 
distinguishing the result from ratios. of subtractant 
numbers of .1. 

5 yable 5 is the computer program, written in source 

code, for. generating gene transcript subtraction profiles. 

Table g is a partial listing of database entries used 
in the electronic northern blot analysis as provided by the 
present -invention. 
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«•?• BRIEF DESCRIPTION OF THE PRRP^ p 

Figure 1 is a chart summarizing data collected and 
stored .regarding. the library construction portion of 
sequence preparation and analysis . . 

figure 2, is a diagram representing .the sequence of 
operations performed by "abundance sort" software in a 
class of preferred embodiments of the inventive method. 

Figure 3 .is a block diagram of a preferred embodiment 
of the system of the invention. , 
20 Fiquye 4 is a more detailed block diagram of the 

bioinformatics process from new sequence (that has already 
been sequenced but not identified) to printout of the 
transcript imaging analysis and the provision of database 
subscriptions . 

25 5. DETAILED DESCRIPT ION of THE INVENTION 

The present invention provides a method to compare the 
relative abundance of gene transcripts in different 
biological specimens by the use of high-throughput 
sequence-specific analysis of individual RNAs or their 

30 corresponding cDNAs (or alternatively, of data representing 
other biological sequences) . This process is denoted 
herein as gene transcript imaging. The quantitative 
analysis of the relative abundance for a set of gene 
transcripts is denoted herein as "gene transcript image 

35 analysis" or "gene transcript frequency analysis". The 
present invention allows one to obtain a profile for gene 
transcription in any given population of cells or tissue 
from any type of organism. The invention can be applied to 
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obtain a profile of a specimen, consisting of a single cell 
(or clones of a single cell), or of. many cells, or of 
tissue more; complex than a single cell and containing., 
multiple. cell types, such as .liver. . 
•5 The invention has significant advantages in the fields 

^diagnostics, toxicology and pharmacology, to name a few. 
A highly sophisticated diagnostic test can be performed on 
the ill patient in whom a diagnosis has not .been made. A 
biological specimen consisting .of the patient's fluids or 
10 tissues is obtained, and the gene, transcripts are isolated 
and expanded to the extent necessary to determine their 
identity. Optionally , the gene transcripts can be 
converted to cDNA . A sampling of the gene transcripts are 
subjected to sequence-specific analysis and quantified. 
15 These gene transcript sequence abundances are compared 
against reference database sequence abundances including 
normal data sets for diseased and healthy patients. The 
patient has the disease (s) with which the patient's data 
set most closely correlates.. 
20 For example, gene transcript frequency analysis can be 

used to differentiate normal cells or tissues from diseased 
cells or tissues, just as it highlights differences between 
normal monocytes and activated macrophages in Table 3. 

In toxicology, a fundamental question is which tests 
25 are most effective in predicting or detecting a toxic 

effect. Gene transcript imaging provides highly detailed 
information on the cell and tissue environment, some of 
which would not be obvious in conventional, less detailed 
screening methods. The gene transcript image is a more 
30 powerful method to predict drug toxicity and efficacy. 
Similar benefits accrue in the use of this tool in 
pharmacology. The gene transcript image can be used 
selectively to look at protein categories which are 
expected to be affected, for example, enzymes which 
35 detoxify toxins. 

In an alternative embodiment, comparative gene 
transcript frequency analysis is used to differentiate 
between cancer cells which respond to anti-cancer agents 
and those which do not respond. Examples of anti-cancer 
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agents are tamoxifen, vincristine,' vinblastine, ' 
- podophyllotoxins; etoposide, tenisposide, cisplatin/ 
biologic response modifiers such as interferon, 11-2, GM- 
CSF, enzymes, hormones and the like. ' This method also 
'5 provides a means for sorting the gene transcripts by 
functional category'. In the case of cancer cells, 
transcription- factors or other essential regulatory 
molecules are very important categories to analyze across 
different libraries. 

10 in yet another- embodiment, comparative gene transcript 

. frequency analysis is used to differentiate between control 
■ liver cells- and liver cells isolated from patients treated 
with experimental drugs" like FIAU to distinguish between 
pathology caused by the underlying disease and that caused 
15 by the drug. 

In yet another embodiment, comparative gene transcript 
frequency analysis is used to differentiate between brain 
tissue from patients treated and untreated with lithium. 
In a further embodiment, comparative gene transcript 
20 frequency analysis is used to differentiate between 
cyclosporin and FK506-treated cells and normal cells. 

In a further embodiment, comparative gene transcript 
frequency analysis is used to differentiate between virally 
infected (including HIV-infected) human cells and 
25 uninfected human cells. Gene transcript frequency analysis 
is also-used to rapidly survey gene transcripts in HIV- 
resistant, HIV-infected, and HIV-sensitive cells. 
Comparison of gene transcript abundance will indicate the 
success of treatment and/or new avenues to study. 
30 in a further embodiment, comparative gene transcript 

frequency analysis is used to differentiate between 
bronchial lavage fluids from healthy and unhealthy patients 
with a variety of ailments. 

In a further embodiment, comparative gene transcript 
35 frequency analysis is used to differentiate between cell, 
plant, microbial and animal mutants and wild-type species. 
In addition, the transcript abundance program is adapted to 
permit the scientist to evaluate the transcription of one 
gene in many different tissues, such comparisons could 
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assemble sequenced DNA fragments into Assemblages, a 
special grouping of data where the relationships, between. , 
sequences, are shown , by graphic overlap, alignment and 1 
statistical, views. The. process is. based on the * 
5 Meyers-Kececioglu.. model of fragment assembly .(INHERITS 
Assembler User fs Manual, Applied Biosystems, Inc. ,:, Foster 
City, CA), and. uses graph theory as the foundation of a 
very rigorous multiple sequence alignment engine .for 
assembling DNA. seguence fragments. other r assembly, programs 
10 that can be used include MEGALI GN (available from DNASTAR 
Inc., Madison, WI) , Dasher and STADEN (available from Roger 
Staden, Cambridge, England) . 

Next, with reference to Fig. 2, we describe in more 
detail the "abundance sort" program which implements above- 
15 mentioned "step (b) « to tabulate * the .number of sequences of 
the library which match, each database entry (the "abundance 
number" for each . database entry). 

Fig. 2 is a flow chart of a preferred embodiment of 
the abundance sort program. A source code listing of this 
20 embodiment of the abundance sort program is set, forth in 

Table 5. In the Table 5 implementation, , the abundance sort 
program is written using the FoxBASE programming language 
commercially available ,from Microsoft. Corporation. 
Although FoxBASE was the program chosen for the first 
25 iteration of this technology, it should not be considered 
limiting. Many other programming languages, Sybase being a 
particularly desirable alternative, can also be used, as 
will be obvious to one with ordinary, skill in the art. The 
subroutine names specified in Fig. 2 correspond to 
30 subroutines listed in Table 5. 

With reference again to Fig. 2, the "Identified 
Sequences" are transcript sequences representing each 
sequence of the library and a corresponding identification 
of the database entry (if any) which it matches. In other 
35 words, the "Identified Sequences" are transcript sequences 
representing the output of above-discussed "step (a)." 

Fig. 3 is a block diagram of a system for implementing 
the invention. The Fig. 3 system includes library 
generation unit 2 which generates a library and asserts an 
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output stream of transcript sequences indicative of the 
biological sequences comprising the library. Programmed 
processor 4 receives the data stream output from unit 2 and 
processes this data in accordance with above-discussed 
5 "step (a)" to generate the Identified Sequences. Processor 
4" can be a processor programmed with the commercially 
available computer program known as the INHERIT 670 
Sequence Analysis System and the commercially available 
computer program known as the Fa ctur a program (both 
10 available from Applied Biosystems Inc.) and with the UNIX 
operating system. 

Still with reference; .to .Fig. 3 L the/Jdentif ied 
Sequences are loaded, into processor 6 vrtiich is programmed 
with the abundance sort program. Processor 6 generates the 
15 Final Transcript sequences indicated in both Figs. 2 and 3. 
Fig. 4 shows a more detailed block diagram of a planned 
relational computer system, including various searching 
techniques which can be implemented, along with an 
assortment of databases to query against. 
20 With reference to Fig. 2, the abundance sort program 

first performs an operation known as "Tempnum" on the 
Identified Sequences, to discard all of the Identified 
Sequences except those which match database entries of 
selected types. For example, the Tempnum process can 
25 select Identified Sequences which represent matches of the 
following types with database entries (see above for 
definition): "exact" matches, human "homologous" matches, 
"other species" matches representing genes present in 
species other than human) , "no" matches (no significant 
30 regions of homology with database entries representing 
previously identified nucleotide sequences) , "I" matches 
(Incyte for not previously known DNA sequences) , or "X" 
matches (matches ESTs in reference database) . This 
eliminates the U, S, M, V, A, R and D sequence (see Table 1 
35 for definitions) . 

The identified sequence values selected during the 
"Tempnum" process then undergo a further selection (weeding 
out) operation known as "Tempred." This operation can, for 
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example, discard all identified sequence values 
representing matches with selected database entries. 

, The identified sequence values- selected during the 
"Tempred" process are then classified according, to library, 
5 during the. "Tempdesig" operation. It is contemplated that' 
the "Identified. Sequences" can represent sequences from a 
single library, or from two or more libraries. 

Consider first the, case that the . identified sequence 
values represent sequences from a." single library, in this 
10 case, all the identified sequence values .determined during 
"Tempred" undergo sorting in the "Templib" operation, 
further sorting in the "Libsort" operation, and finally 
additional sorting in the "Temptarsort" operation. For., 
example, these three sorting . operations can sort the. 
15 identified sequences in order of decreasing "abundance 
number" (to generate a. list of .decreasing abundance 
numbers, each abundance number corresponding to a unique 
identified sequence entry, or several lists of decreasing 
abundance numbers, with the abundance numbers in each list 
20 corresponding to database entries of a selected .type) with 
redundancies eliminated from each sorted list. In this 
case, the operation identified as "Cruncher" can be 
bypassed, so that the "Final Data" values are the organized 
transcript sequences produced during the "Temptarsort" 
25 operation. 

We next consider the case that the transcript 
sequences produced during the "Tempred" operation represent 
sequences from two libraries (which we will denote the 
"target" library and the "subtractant" library). For 
example, the target library may consist of cDNA sequences 
from clones of a diseased cell, while the subtractant 
library may consist of cDNA sequences from clones of the 
diseased cell after treatment by exposure to a drug. For 
another example, the target library may consist of cDNA 
sequences from clones of a cell type from a young human, 
while the subtractant library may consist of cDNA sequences 
from clones of the same cell type from the same human at 
different ages. 



30 



35 



31 



15 



W095/2( * 81 PCr/US9S/01,60 

In this case, the 'ITempdesig" operation routes all 
transcript sequences representing the target library for 
processing in accordances with VTemplib". (and then "Libsor.t.'J 
r and i r T ^ rnp ^ a 5 sor ^") ' -and routes all transcript sequences 
5 representing the subtractant library for processing in 
accordance with "Tempsub." (and then "Subsort" and 
"Tempsubsort").. For example, the consecutive "Templib," 

••Libsort/Vand "Temptarsort" sorting operations sort 
, identified sequences from the target library in order of 
10 deqreasing abundance number (to generate a list of 
decreasing abundance numbers, each. abundance number 
corresponding to } * database entry, or several lists of 
decreasing abundance numbers, with the abundance numbers in 
each list corresponding to database entries of a selected 
type) with redundancies eliminated from each sorted list. 
The consecutive "Tempsub, "Subsort ,". and "Tempsubsort" 
sorting operations sort identified sequences from the 
subtractant library in order of decreasing abundance number 
(to generate a list of decreasing abundance numbers, each 
abundance number corresponding to a database entry, or 
several lists of decreasing abundance numbers, with the 
abundance numbers in each list corresponding to database 
entries of a selected type) with redundancies eliminated 
from each sorted list. 

The transcript sequences output from the "Temptarsort" 
operation typically represent sorted lists from which a 
histogram could be generated in which position along one 
(e.g., horizontal) axis indicates abundance number (of 
target library sequences), and position along another 
30 (e.g., vertical) axis indicates identified sequence value 
(e.g., human or non-human gene type). Similarly, the 
transcript sequences output from the "Tempsubsort" 
operation typically represent sorted lists from which a 
histogram could be generated in which position along one 
35 (e.g., horizontal) axis indicates abundance number (of 

subtractant library sequences) , and position along another 
(e.g., vertical) axis indicates identified sequence value 
(e.g., human or non-human gene type). 



20 



25 



32 



WO 95/20681 



PCTYUS95/01160 



The transcript sequences (sorted lists) output from 
the Tempsubsort and. Temptarsort .sorting operations are x .< 
combined during the operation identified as "Cruncher." 
The "Cruncher'?, process identifies pairs of corresponding 
5 target .and ^subtractant -abundance numbers (both .representing 
the same identified- sequence value) , and divides one by the 
other to generate a, "ratio 1 ! value for each .pair -of 
corresponding abundance numbers, and then sorts the ratio 
values -in order , of decreasing .ratio value. The data output 
10 from the "Cruncher" operation (the Final .Transcript 

sequence in Fig. 2) . is ^typically a sorted list from .which a 
histogram could be generated in which position along one 
axis indicates, the size of a ratio of abundance numbers 
(for corresponding identified sequence values from target 
15 and subtractant libraries) and position along another axis 
indicates identified sequence value (e.g., gene type). 

Preferably, prior to obtaining a ratio between the two 
library abundance values, the Cruncher operation also 
divides each ratio value by the total number, of sequences 
20 in one or both of the target and subtractant libraries. 

The resulting lists of "relative" ratio values, generated by 
the Cruncher operation are useful for many medical, 
scientific, and industrial applications. Also preferably, 
the output of the Cruncher operation is a set of lists, 
25 each list representing a sequence of decreasing ratio 
values for a different selected subset (e.g. protein 
family) of database entries. 

In one example, the abundance sort program of the 
invention tabulates for a library the numbers of mRNA 
30 transcripts corresponding to each gene identified in a 

database. These numbers are divided by the total number of 
clones sampled. The results of the division reflect the 
relative abundance of the mRNA transcripts in the cell type 
or tissue from which they were obtained. Obtaining this 
35 final data set is referred to herein as "gene transcript 
image analysis." The resulting subtracted data show 
exactly what proteins and genes are upregulated and 
downregulated in highly detailed complexity. 
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6 • 6 . . HUVEC cDNA LIBRARy 
Table 2 is an abundance table .listing -the .various gene 
transcripts .in an. induced HUVEC . library . The. .transcripts 
are. listed in. order of decreasing abundance. This., 
5 computerized sorting simplifies. analysis of the tissue and 
speeds identification of significant new, proteins. which are 
specific to this cell. type. This type of endothelial cell 
lines, tissues of the cardiovascular system, and the more 
that is known about its composition, particularly in 
10 response to activation, -the more choices of protein targets 
become-.available to affect in treating disorders of this 
tissue, such as the highly prevalent atherosclerosis. 

■ •* ' - 

6 - 7 " MONQCYTE-CELL AN D MAST-CELL cDNA LIBRARTES 

Tables. 3 and 4 show truncated comparisons of two 
15 libraries. In Tables. 3 and 4 the "normal monocytes" are 
the HMC-l cells, and the "activated macrophages" are the 
THP-l cells pretreated with PMA and activated with LPS. 
Table 3 lists in descending order of -abundance the most 
abundant gene transcripts for both cell types. With only 
20 15 gene transcripts from each cell type, this table permits 
quick, qualitative comparison of the most common 
transcripts. This abundance sort, with its convenient 
side-by-side display, provides an immediately useful 
research tool. In this example, this research tool 
25 discloses that 1) only one of the top 15 activated 
macrophage transcripts is found in the top 15 normal 
monocyte gene transcripts (poly A binding protein); and 2) 
a new gene transcript (previously unreported in other 
databases) is relatively highly represented in activated 
30 macrophages but is not similarly prominent in normal 

macrophages. Such a research tool provides researchers 
with a short-cut to new proteins, such as receptors, cell- 
surface and intracellular signalling molecules, which can 
serve as drug targets in commercial drug screening 
35 programs. Such a tool could save considerable time over 
that consumed by a hit and miss discovery program aimed at 
identifying important proteins in and around cells, because 
those proteins carrying out everyday cellular functions and 
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represented as steady state, mRNA are quickly eliminated 
from further characterization. 

This illustrates how the gene transcript profiles 
change with- altered cellular .function. , Those .skilled in. 
5 the art .know that the biochemical composition of. cells also 
changes with other functional changes such as cancer, 
including. cancer's various, stages, and exposure to 
toxicity. A gene transcript subtraction profile such as in 
Table. 3 is useful as a first screening- tool, for, such gene 
10 expression and protein, studies. . 

6.8. SUBTRACTION ANALYSIS OF NORMAL MONOCYTE- CELL AND 
ACTIVATE D MONOCYTE CELL cDNA LIBRARIES 

Once the cDNA data are in the computer, the computer 
program as disclosed in Table 5 was used to obtain ratios 
15 of all the gene transcripts in the' two libraries discussed 
in Example 6.7, and the gene transcripts were sorted by the 
descending values of their ratios. If a gene transcript is 
not represented in one library, that gene transcript's 
abundance is unknown but appears to be less than 1. As an 
20 approximation — and to obtain a ratio, which would not be 
possible if the unrepresented gene were given an abundance 
of zero — genes which are represented in only one of the 
two libraries are assigned an abundance of 1/2. Using 1/2 
for unrepresented clones increases the relative importance 
25 of "turned-on" and "turned-off" genes, whose products would 
be drug candidates. The resulting print-out is called a 
subtraction table and is an extremely valuable screening 
method, as is shown by the following data. 

Table 4 is a subtraction table, in which the normal 
30 monocyte library was electronically "subtracted" from the 
activated macrophage library. This table highlights most 
effectively the changes in abundance of the gene 
transcripts by activation of macrophages. Even among the 
first 20 gene transcripts listed, there are several unknown 
35 gene transcripts. Thus, electronic subtraction is a useful 
tool with which to assist researchers in identifying much 
more quickly the basic biochemical changes between two cell 
types. Such a tool can save universities and 
pharmaceutical companies which spend billions of dollars on 
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research valuable time and laboratory resources at the 
early discovery stage and can. speed up the dr.ug development 
cycle, which in turn permits researchers to set "up drug 
screening programs much earlier! Thus, this research tool 
5 provides a way to get new drugs to the public faster and 
more economically. 

Also, such a subtraction table can be obtained for 
patient diagnosis. An individual patient sample (such as 
monocytes obtained from a biopsy or blood sample) can be 
10 compared with data provided herein to diagnose conditions 
associated with macrophage activation. 

Table 4 uncovered many new gene transcripts (labeled 
Incyte clones) . Note that many genes are turned on in the 
activated macrophage (i.e., the monocyte had a 0 in the 
15 bgf req column) . This screening method is superior to other 
screening techniques, such as the western blot, which are 
incapable of uncovering such a multitude of discrete new 
gene transcripts. 

The subtraction-screening technique has also uncovered 
20 a high number of cancer gene transcripts (oncogenes rho, 
ETS2, rab-2 ras, YPTl-related, and acute myeloid leukemia 
mRNA) in the activated macrophage. These transcripts may 
be attributed to the use of immortalized cell lines and are 
inherently interesting for that reason. This screening 
25 technique offers a detailed picture of upregulated 

transcripts including oncogenes, which helps explain why 
anti-cancer drugs interfere with the patient's immunity 
mediated by activated macrophages. Armed with knowledge 
gained from this screening method, those skilled in the art 
30 can set up more targeted, more effective drug screening 
programs to identify drugs which are differentially 
effective against 1) both relevant cancers and activated 
macrophage conditions with the same gene transcript 
profile; 2) cancer alone; and 3) activated macrophage 
35 conditions. 

Smooth muscle senescent protein (22 kd) was 
upregulated in the activated macrophage, which indicates 
that it is a candidate to block in controlling 
inflammation. 



36 



WO 95/20681 



PCT/US95/01160 



6.9. SUBTRACTION ANALYSIS OF NORMAL LIVER CELLS AND 
HEPATITIS INFECTED LIVER CELL cDNA LIBRARIES 

In this example, rats are exposed to hepatitis virus ' 

and maintained in the -colony until they show definite signs 

5 of hepatitis. Of the rats diagnosed with hepatitis, one 

half of the rats:are treated with a. new anti-hepatitis 

agent (AHA) . ; Liver samples are .obtained from all rats 

before exposure to the hepatitis virus and at the end of 

AHA treatment or- no treatment. In -addition/ liver samples 

10 can be obtained from rats with hepatitis just prior to AHA 
treatment. •- . 

The liver tissue is treated as described in Examples 
6.2 and 6.3 to obtain mRNA and subsequently to sequence 
cDNA. The cDNA from each sample are processed and analyzed 

15 for abundance according to the computer program in Table 5. 
The resulting, gene transcript images of the cDNA provide 
detailed pictures of the baseline (control) for each animal 
and of the infected and/or treated state of the animals. 
cDNA data for a group of samples can be combined into a 

20 group summary gene transcript profile for all control 
samples, all samples from infected rats and all samples 
from AHA-treated rats. 

Subtractions are performed between appropriate 
individual libraries and the grouped libraries. For 

25 individual animals, control and post-study samples can be 
subtracted. Also, if samples are obtained before and after 
AHA treatment, that data from individual animals and 
treatment groups can be subtracted. In addition, the data 
for all control samples can be pooled and averaged. The 

30 control average can be subtracted from averages of both 
post-study AHA and post-study non-AHA cDNA samples. If 
pre- and post-treatment samples are available, pre- and 
post-treatment samples can be compared individually (or 
electronically averaged) and subtracted. 

35 These subtraction tables are used in two general ways. 

First, the differences are analyzed for gene transcripts 
which are associated with continuing hepatic deterioration 
or healing. The subtraction tables are tools to isolate 
the effects of the drug treatment from the underlying basic 
40 pathology of hepatitis. Because hepatitis affects many 
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parameters, additional liver toxicity has been difficult to 
detect with only blood tests for the usual enzymes. The 
;, Ji jjene transcript' profile and subtraction provides a much 
_r • ■ "'more complex biochemical picture which researchers have 
5 needed to analyze such difficult problems. 

Second, the subtraction tables provide a tool for 
identifying clinical markers, individual proteins or other 
biochemical determinants which are used to predict and/or 
evaluate a clinical endpoint, such as disease, improvement 
10 ; due to the drug, and even additional pathology due to the 
.drug. The subtraction- tables specifically highlight genes 
which are turned on or off. Thus, the .subtraction tables 
..provide a first screen for a set of ' gene transcript 
candidates for use as clinical markers. .Subsequently, 
15 electronic subtractions of additional cell and tissue 

libraries reveal which of the potential markers are in fact 
found in different cell and tissue libraries.. Candidate 
f gene transcripts found in additional libraries are removed 

from the set of potential clinical markers. Then, tests of 
20 blood or other relevant samples which are known to lack and 
have the relevant condition are compared to validate the 
selection of the clinical marker. In this method, the 
particular physiologic function of the protein transcript 
^need not be determined to qualify the gene transcript as a 
25 clinical marker. 

€•10. ELECTRONIC NORTHERN BLOT 
One limitation of electronic subtraction is that it is 
difficult to compare more than a pair of images at once. 
Once particular individual gene products are identified as 

30 relevant to further study (via electronic subtraction or 
other methods), it is useful to, study the expression of 
single genes in a multitude of different tissues. in the 
lab, the technique of "Northern" blot hybridization is used 
for this purpose. In this technique, a single cDNA, or a 

35 probe corresponding thereto, is labeled and then hybridized 
against a blot containing RNA samples prepared from a 
multitude of tissues or cell types. Upon autoradiography, 
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the pattern of expression of that particular gene, one at a 
tine, can be quant itate'd in all the included samples. 

In contrast, a further embodiment of this invention is 
...the computerized * form of this process,; termed here 
5 "electronic northern, blot." In this variation, a single 
'.gene is queried for expression against a .multitude of 
'prepared and sequenced libraries present within the 
.-database. In thisi.way, the pattern «of expression of any 
\: single candidate gene can be examined. instantaneously and 
; 10 effortlessly. More candidate genes'can thus be scanned, 
•leading to more frequent and fruitfully relevant,, P \ 
^discoveries. The computer program" Included as Table 5 
includes a program for performing^ this" function^ and Table 
6 is a partial listing of entriesVof 'the. database used in 
15 the electronic northern blot analysis. 

€•11. - PHASE I CLINICAT, TRIALS 

Based on the establishment of safety and effectiveness 
in the above animal tests, Phase I clinical tests are 
undertaken. Normal patients are subjected to the usual 
20 preliminary clinical laboratory tests, in addition, 
appropriate specimens are taken and subjected to gene 
transcript analysis. Additional patient specimens are 
taken at predetermined intervals during the test. The 
specimens are subjected to gene transcript analysis as 
25 described above. In addition, the gene transcript changes 
noted in the earlier rat toxicity study are carefully 
evaluated as clinical markers in the followed patients. 
Changes in the gene transcript analyses are evaluated as 
indicators of toxicity by correlation with clinical signs 
30 and symptoms and other laboratory results, m addition, 
subtraction is performed on individual patient specimens 
and on averaged patient specimens. The subtraction 
analysis highlights any toxicological changes in the 
treated patients. This is a highly refined determinant of 
35 toxicity. The subtraction method also annotates clinical 
markers. Further subgroups can be analyzed by subtraction 
analysis, including, for example, l) segregation by 
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occurrence and type of adverse effect; and 2) segregation 
by dosage. 

6 * 12 ' GENE TRANSCRIPT IMAG ING ANALYSIS IN eilxNICAL STITOTPB 

A gene transcript imaging analysis (or multiple gene 
5 transcript imaging analyses) is. a useful tool in .'other 
clinical studies. For example ,; the. difference's in gene 
transcript imaging analyses before and after treatment can 
be assessed for patients. oh p lacebovand "drug treatment . 
This method also effectively screens for clinical .markers 
10 to follow in clinical use of the drug. T 

6 * 13 * COMPARATIVE GENE TB*M S CRIPT AWTVT.YSIS BFTPP TO SPECIES 

The subtraction method can be used to screen cDNA 
libraries from diverse, sources. For example; the' same cell 
types from different species can be compared by gene 
15 transcript analysis to screen for specific differences, 
such as in detoxification enzyme systems. Such testing 
aids in the selection and validation of an animal model for 
the commercial purpose of drug screening or toxicological 
testing of drugs intended for human or animal use. When 
20 the comparison between animals of -different species is 

shown in columns for each species,; we refer to this as an 
. interspecies comparison, or zoo blot. 

Embodiments of this invention may employ databases 
such as those written using the FoxBASE programming 
language commercially available from Microsoft Corporation. 
Other embodiments of the invention employ other databases, 
such as a random peptide database, a polymer database, a 
synthetic oligomer database, or a oligonucleotide database 
of the type described in U.S. Patent 5,270,170, issued 
December 14, 1993 to Cull, et al., PCT International 
Application Publication No. WO 9322684, published November 
11, 1993, PCT International Application Publication No. WO 
9306121, published April i, 1993, or PCT International 
Application Publication No. WO 9119818, published December 
35 26, 1991. These four references (whose text is 

incorporated herein by reference) include teaching which 
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may be applied in implementing such other embodiments of 
the present invention. 

All references referred to in the preceding text are 
* hereby expressly incorporated by reference herein. 
-5 Various modifications and variations of the described 

method and system of the invention will be apparent to 
' those skilled in the art without departing from the scope 
and spirit of the invention. Although the invention has 
been described in connection with specific preferred 
10, embodiments,, it should be understood that the invention .as 
claimed should not be unduly limited to such specific ^ 
embodiments. 
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TABLE 2 



Clone numbers 15000 through 20000 
Libraries: HUVEC - v ^ 
Arranged i?y ABUNDANCE 
Total clones analyzed: 5000 

319 genes, for a total of 1713 Clones 





number ;. 


U 




entry 


1 


15365 


67 : 




. HSRPL4 1 


2 


15004 


65 ' 




* NCY015004 


3 


15638 


63 




' NCY015638 


4 


15390 


50 


* ;■■•*« 


-.NCY015390 


5 


15193 


47' 




HSFIB1 


6 


15220 


47 




RRRPL9 


7 


15280 , 






NCY015280 


8 


15583 * 


"33 




M62060 


9 


15662 . 


31 




HSACTCGR 


10 


15026 


-29 




NCY015026 


11 


15279 


' 24 




HSEF1AR 


12 


15027 


23 




NCY015027 


13 


15033 


20 




NCY015033 


14 


15198 


. 20 




NCY015198 


15 


15809 


20 




HSCOLL1 


16 


15221 


19 




NCY015221 


17 


15263 


19 




NCY015263 


18 


15290 


19 




NCY015290 


19 


15350 


18 




.. NCY015350 


20 


15030 


17 




NCY015030 


21 


15234 


17 




NCY015234 


22 


15459 


16 




NCY015459 


23 


15353 


15 




NCY015353 


24 


15378 


15 




. S76965 , 


25 


15255 


- 14 




HUMTHYB4 


26 


15401 


14 




H5LIPCR 


27 


-15425 


3.4 




. H5POLYAB 


28 


18212 


14 




HUMTHYMA 










U wis * 

HSHRP1 


30 


15189 ' 


13 




H518D 


31 


15031 . 


12 




HUMFKBP 


32 


15306 


12 




KSH2AZ 


33 


15621 


12 




HUMLEC 


34 


15789 


11 




NCY015789 


35 


16578 


11 




HSRPS11 


36 


16632 


' 11 




M61984 


37 


18314 


11 




NCY018314 


38 


15367 


10 




NCY015367 


39 


15415 


10 




HSIFNIN1 


40 


.15633 


10 




HSLDHAR 


41 


15813 


10 




CHKNMHCB 


42 


18210 


10 




NCY018210 


43 


18233 


10 




HSRPII140 


44 


18996 


10 




NCY018996 


45 


15088 


9 




HUMFERL 


46 


15714 


9 




NCY015714 


47 


15720 


9 




NCY015720 


48 


15863 


9 




NCY015863 


49 


16121 


9 




HSET 


50 


16252 


9 




NCY018252 


51 


15351 


8 




HUMALBP 


52 


15370 


8 




NCY015370 



s descriptor 

Riboptn L41 
INCYTE 015004 
INCYTE 015638 
INCYTE 015390 
Fibronectin 
R Riboptn L9 

INCYTE 015280 
EST HHCH09 (IGR) 
Act in, gamma . 
INCYTE 015026 
Elf 1-alpha 
INCYTE 015027 
INCYTE 015033 
INCYTE 015198 
Collagenase 
INCYTE 015221 
INCYTE 015263 
INCYTE 015290 
INCYTE 015350 
INCYTE 015030" 
INCYTE 015234 
INCYTE 015459 
INCYTE 015353 - 
Ptn kinase inhib 
Thymosin bet a -4 
Lipocortin I . 
Poly-A bp * : . 
Thymosin, alpha 

Motility relat ptn; MRP-l;C0-9 

Interferon indue ptn 1-8D 

FK506 bp 

Histone H2A 

Lectin, B-galbp, 14kDa 

INCYTE 015789 

Riboptn Sll 

EST HHCA13 (IGR) 

INCYTE 018314 

INCYTE 015367 

interferon indue mRNA 

Lactate dehydrogenase 

C Myosin heavy chain B 

INCYTE 018210 

RNA polymerase II 

INCYTE 018996 

Ferritin, light chain 

INCYTE 015714 

INCYTE 015720 

INCYTE 015863 

Endothelin 

INCYTE 018252 

Lipid bp, adipocyte 

INCYTE 015370 
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TABLE 2 Con't 



- .number - 



53 


t 1567U 


o 
o 


BTCIASHI 


54 


,15795 . 


b 


NCY015795 


55 


16245 v ~ 


8 


NCY016245 


56 


J 182 62 - 


. £ 


NCY018262 


57 


18321 


6 


HSRPL17 


58 


"15126 * 


7 


XLRPL1BRF 


59 


• 15133 - 


7 


HSAC07 . 


60 


15245' 


7 


NCY015245 


61 


15288 


7 


NCY015288 


62 


15294 


* 7 


• HSGAPDR 


63 


15442 


7 


HUMLAMB 


64 


15485 


7 . . 


. HSNGMRNA 


65 


\ 16646 


7 


NCY016646 


66 


18003 


7 


HUMPAIA 


67 


15032 


6 


HUMUB 


68 


15267 


6 


HSRPS8 


69 


15295 


6 


NCY015295 


70 


15458 


• 6 


RNRPS10R 


71 


15832 


6 


RSGALEM 


72 


15928 


6 


HUMAPOJ 


73 


16598 


6 


HUKTBBM40 


74 


18218 


6 


NCY018218 


75 


16499 


6 


- HSP27 


76 


18963 


6 


NCY018963 


77 


18997 


6 


NCY018997 


78 


15432 


5 


H SAG ALAR 


79 


15475 


5 


NCY015475 


80 


15721 


5 


NCY015721 


81 


15865 


5 


NCY015865 


82 


16270 


5 


NCY016270 


83 


16886 


5 


NCY016886 


84 


18500 


5 


NCY018500 


85 


18503 


5 


NCY018503 


86 


19672 


5 


RRRPL34 


87 


15086 




XLRPL1AR 


88 


15113 




HUMIFNWRS 


89 


15242 




NCY015242 


90 


15249 




NCY015249 


91 


15377 




NCY015377 


92 


15407 




NCY015407 


93 


15473 




NCY015473 


94 


15588 




HSRPS12 


95 


15684 




HSEF1G 


96 


15782 




NCY015782 


97 


15916 




HSRPS18 


98 


15930 




NCY015930 


99 


16108 




NCY016108 


100 


16133 




NCY016133 



s 
V 



R 
R 



R 
F 



descriptor 

NADH-ubiq oxidoreductase 
INCYTE 015795 
INCYTE 016245 
INCYTE 018262 
Riboptn LI 7 
Riboptn LI 
Act in, beta 
INCYTE 015245 
INCYTE 015288 
;G-3-PD 

Laminin receptor, 54kDa 
Uracil DNA glycosylase 
INCYTE 016646 
Plsmnogen activ gene 
Ubiguitin 
Riboptn 58 
INCYTE 015295 
Riboptn S10 

UDP-galactose epimerase 
Apolipoptn J 
Tubulin, beta 
INCYTE 018218 
Hydrophobic ptn p27 
INCYTE 018963 
INCYTE 018997 
Galactosidase A, alpha 
INCYTE 015475 
015721 
015865 
016270 
016886 
018500 
018503 



INCYTE 
INCYTE 
INCYTE 
INCYTE 
INCYTE 
INCYTE 
Riboptn L34 
Riboptn Lla 
tRNA synthetase, 
INCYTE 015242 
INCYTE 015249 
INCYTE 015377 
INCYTE 015407 
INCYTE 015473 
Riboptn 512 
Elf 1 -gamma 
INCYTE 015782 
Riboptn 518 
INCYTE 015930 
INCYTE 016108 
INCYTE 016133 
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TABLE 4 



Libraries: THP-1 
Subtracting: HMC 
Sorted by ABUNDANCE 
Total clones analyzed: 



7375 



1057 genes, for .a total of 2151. clones 

number entry £ descriptor 

10022 - 4 HUMIL1 IL 1-beta . - 

10036 * HSMDNCF IL-8 

10089 - HSLAG1CDN Lymphocyte activ gene 

10060 -HUMTCSM RANTES 

10003 HUMMIP1A ' MIP-1 

10689 . „ HSOP Osteopontin 

11050 NCY011050 INCYTE 011050 

10937 HSTNFR TNF-alpha 

10176 HSSOD Superoxide dismutase 

10886 HSCDW40 B-cell activ, NGF-relat 

10186 HUMAPR Early resp PMA-induc 

10967 HUMGDN PN-1, glial-deriv 

11353 NCY011353 INCYTE 011353 

10298 NCY010298 INCYTE 010298 

10215 HUM 4 COLA Collagenase, type IV 

10276 NCY010276 INCYTE 010276 

10488 NCY010488 INCYTE 010488 

11138 NCY011138 INCYTE 011138 

10037 HUMCAPPRO Adenylate cyclase 
10840 HUMADCY Adenylate cyclase 
10672 HSCD44E Cell adhesion glptn 
12837 HUMCYCLOX Cyclooxygenase-2 
10001 NCY010001 INCYTE 010001 
10005 NCY010005 INCYTE 010005 
10294 NCY010294 INCYTE 010294 
10297 NCY010297 INCYTE 010297 
10403 NCY010403 INCYTE 010403 
10699 NCY010699 INCYTE 010699 
10966 NCY010966 INCYTE 010966 
12092 NCY012092 INCYTE 012092 
12549 HSRHOB Oncogene rho 

10691 HUMARF1BA ADP-ribosylation fctr 

12106 HSADSS Adenylosuccinate synthetase 

10194 HSCATHL Cathepsin L 

10479 CLMCYCA I Cyclin A 

10031 NCY010031 INCYTE 010031 

10203 NCY010203 INCYTE 010203 

10288 NCY010288 INCYTE 010288 

10372 NCY010372 INCYTE 010372 

10471 NCY010471 INCYTE 010471 

10484 NCY010484 INCYTE 010484 

10859 NCY010859 INCYTE 010859 

10890 NCY010890 INCYTE 010890 

11511 NCY011511 INCYTE 011511 

11868 NCY011868 INCYTE 011868 

12820 NCY012820 INCYTE 012820 

10133 HSI1RAP IL-1 antagonist 

10516 HUMP2A Phosphatase, regul 2A 

11063 HUMB94 TNF-induc response 

11140 HSHB15RNA HB15 gene; new Ig 

10788 NCY001713 INCYTE 001713 

10033 NCY010033 INCYTE 010033 

10035 NCY010035 INCYTE 010035 

10084 NCY010084 INCYTE 010084 

10236 NCY010236 INCYTE 010236 

10383 NCY010383 INCYTE 010383 



bgfreq rfend ratio 



0* 


131 


« Da • Uv 


o 


A A7 


«JO • UU 


o 


i A 




u 




46*000 


j 


aZI 


40,333 


u 


20 


40.000 


ft 
U 


17 


34.000 


U 


17 


34.000 


O 


14 


28.000 


O 


10 


20.000 


ft 
U 


9 


18.000 


ft 
U 


9 


18.000 


ft 

V 


6 


16.000 


U 


7 


14.000 


U 


6 


12.000 


n 
U 


^ 


12 .000 


u 


o 


12.000 


ft 

w 


© 


12.000 




AU 


10.000 


o 


c 


1 ft AAA 

IV • UUU 


o 


c 


^ ft ftftft 
aU • uuu 


o 


C 

9 


1 ft ftftft 

AU • uuu 


o 


5 


1 ft fififi 


o 


5 


1 fi fififi 
AU . UUU 


o 


5 


1 ft fififi 


0 


5 


1ft ftftft 

Aw . UUU 


0 


- 5 


10.000 


0 


5 


10.000 


0 


5 


10.000 


0 


5 


10.000 


0 


5 


10.000 


0 


4 
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TABLE 4 Con't 











number- . 

J; -Ti ■ * 


entry 


6 descriptor 


10450 


NCY010450 




01 04 ^0 


10470 


KCY010470 


TMCYTT 


n i o^ ?n 


10504 


NCY010504 


TMCYTT 




10507 „. * 


' NCY010507 




o i ntm 

U1U3U7 


10598 


NCY010S9B 




ai ntoo 


10779 


NCY010779 


INCITE 


010779 


10909 


NCY010909 


INCYTE 


010909 


10976 




INCITE 


010976 


10985 


NCY01OQ85 


INCYTE 


010985 


1 1052 


NCY01 1 0*2 


INCYTE 


011052 


1 1 ftfifl 
11UOO 


Hpvni i nee 


inCXTt 


011068 




Nrvm 1 1 \a 


INCYTE 


011134 




Mf*V01 1 1 It 


INCYTE 


011136 


11191 


NCXUlll7l 


INCYTE 


011191 


11219 * 

A ^ A X 7 


NCY011219 


INCYTE 


011219 


11386* 


NCY011386 


INCYTE 


011386 


11403 


NCY011403 


INCYTE 


011403 


11460 


NCY011460 


INCYTE 


011460 


11618. 


NCY011618 


INCYTE 


011618 


11686 


NCy011686 


INCYTE 


011686 


12021 


NCY012021 


INCYTE 


012021 


12025 


NCY012025 


INCYTE 


012025 


12320 


NCY012320 


INCYTE 


012320 


12330 


NCY012330 


INCYTE 


012330 


12853 


NCY012853 


INCYTE 


012853 


14386 


NCY014386 


INCYTE 


014386 


14391 


NCY014391 


INCYTE 


014391 
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0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 



3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 



6.000 
6.000 
6.000 
.6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
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TABLE 5 



• Master -sou fcr SUBTRACTION output 
SET T M*X O PT 
SBT'fiATEW OFF 

err p cact w 

SET TOTEAHEAD TO 0 

CLEAR' J* 

SET DEVICE TO SCREEN 

USB- , aanartfiw:FoxBASE+/Mac;fCK files: Clones. dbf" 
co. to p- ; . 

STO RE NOM HER TO 2MZT2ATC 
00 i& JT/C H 

STORE KOWHER TOTTRKDttTE 

STORE 1 • TO Taryetl 

STORE ' 1 TO .Taroet2 

STORE 1 TO' Targets ' - ' 

STORE .. 1 ' • TO Objectl 

STORE 1 ' 'TO Gbject2 

STORE 4 - 1 TO:Qbject3* . 

STORE 0 TO AKXL ' 

STOR E 0 TO DftTCH 

STORE 0 TO MKATOi 

STORE 0 TO GKATCH 

STORE 0 TO BATCH 

STORE 0 TO ST? . 

STORE 1 TO BAIL 

00 VQGLE .T» 

* "Program, i 'Subtraction 2.fet 
r*u...,» ,10/11/94 

♦Version, i FoxEA6E*/Kac, revision i. 10 

• Notes....: Fosat file Subtraction 2 



e raas 87.m< swr -extraction iuS- km isfai mot loiJK- 274een» no,,,, 

G KXEW.117.J26 OCT DttTCK STOLE 65536 TOOT «CbiL^12 KCTOiffi ^I^S&T'L. 
e-fZXELS 135..126 OCT WftTCK «WLE SSS36 FONT • Chirac- 12 fiSSS; ^ 15,63 00 

0 PIXELS 252,236 GET terainate STOLE 0 FCNT •Geneva' 12 fi*!2P ?5 ?a wn?« i 1 I i • . 

e pixels so.aee td'ibi.397 sklz j 8 7i color o o.1i.-2«6o -i -i 

! UZSi 2£ 'f* c)t sreund:' 6THX 6S536 KJOT •tara-.no COLOR 0 0.-1 -1 -1 -1 

■ H I ESS-ESSES 5=E! ££ SB SS S ; • i- . 

o SSSf ?n2-SL t 5L t *? et3 0 PaiJT , 0«»«va , .9 6IZE 12 79 COLOR 0 0 -1 -1-1 3 

2 £££ S£ °£« Ct i £1VLE 0 'G«>«va",9 GIZZ 12.79^0L0R O.O.-i.-i -i^i 

! Hf *2S2 CET "*a«et2 STYLS 0.F0NT 'Cimeva'^ SIZE 12 79 O3L0R 0 0 -1 Ii Ii ■? 

6 fHELS 162.299 GET object 3 STYLE 0 FONT -Geneva' 9 SIZE 12 79 COLOR o'S l ?' J' i 
J FIXELS 276.324-GET Bail 6KLE 65536 FONT 'Oi^'.'nnalZ Wto?i2f fci'Sa 4112 
t Subtraction . 2 . £mt 



REM) 
ZF Bail«2 
CLEAR 

CLO SE DATABASES 

USE , &nart 0uyiPoadBASB4/Mac:fox files : clones. dbf 
■SE T gA ir n O N 
SCREB J, 1 OFF 
RETURN 
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srcfcs v*i»(s ys(3)?f tc startihe ' * 

STORE UFPBl { Target J ). TO Tcxgetl '<-.• " • ■ - ; . : " * ' '* 

STORE . PFPER .tTaryet3) TO Target2 ' r ^ 

STORE UFFSUTkryetSLTO Tajyet3 

STORE UFPER (Objoetl) Ofcicctl ' " * 

UFPSKObjcctSJ -TO- Objects 
STORE UPPER (Object 3) TO Cbject3 - ' - 
clear . ~ ■ 

srr t auc or . 

GAP * TSWINMB-lKlTIArefl " s- ; /• 

GO DCITIWE- - • 

COPY NOCT GAP nnj)S NUM3QI. library. D Pa'p "WrW ~e Wrr-i T . l... 
COUNT TO TOP 

ff^S TO ^* ,B,, " tt ^ ,0 ' , - m "'^ , « ,r ^'* l -«.i>- , x : " 

COPY S TKPCT TO TO TEMPDSSIC *'. t: 

.USE TOiFDESIC 
- 7 &oatch»l • • • , 

A FFEtt D FROM FOR Da*B' 

IT*teacchal rc " * * 

A PPEN D FROM TOffMUM FOR E='K' 

INDIF 

ZF OcfitCbil 

Appan? FRb^'raawuM for Ds'o 1 

• B9DXF 
IF Unatehsl 

APPEND FROM nWMM FOR Db«J». OR Ds'X' 

OODWT TO STARXOT . 

COPY STRUCTURE TO TEMPUB 
USE TTMRJE • • 

^ P ^^S^^ IPDESIG f° R lltoraj y^ER(targetl) 
""ND FROM TEMPDESIG FOR library-UPPER (targets) 
IF target 3<>» .-.**■".•*■ - 
APPIND FRflM TTODESIS FOR lihrary.DPPER (target) 
OOCKT TO ANALflOT 

use ro mgsi c . 

GOFT STRUCTURE TO TCMPSUB ~* 
USE TEMPSUB 

W^M^FROJ^TEMPOESIG FOR lihr&rystJFFER(Objectl) 
^P^KD FRB1 TO1PBES2G FOR, libra ry=UF PER ( Object 2) 
IF terget3o' 

•APFDO FBCW TB1FDESIC FOR lihraryrUPPER(CbjtCt3) 

O0U BH' TO fcUB lWlCiU] 1 
SET TALK OFT" 

* COMPRESSION SUBROraWE A * **•*•••*•*••••»•••••••••••••♦♦♦♦♦ 

? •OOMPRESSINS CUEHY LIBRARY 1 
USE TB4FUB 
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SORT CK'ZHIFY ff NUMBER TO LXBSQST 

"USE UBS QKT 

..CCOOT TO XDGENB 

. REPLACE Al* RTEKD TOTH 1 . , 

Mxioa • 1 

. -BVQcO 
DO 1BKJLE CW3.0 ROLL 

IF KAKQ >= IDGBNB.. 
//PACK > . " 

CCOWT TO AOKZQUS 

LOOP 
X32D2F 
G0MARK1 
CUP * X 
- STOKE SQny TO TESTA 
STORE D TO DS5XGA • 

sw»o. 

■ DO KHXXiE SW»0 ..TEST 
SKIP 

STORE ENXR¥ TO TESTE 
STORE D TO CGSXGB 

' XT Ttlg TA c TZ£TB.AJO.D2£IGXbTESIGB 

DELETE • ' 

CUP « SUP+1 

LOOP 

OT3IF 
GOHARJCl 

REPLACE RTQJD WTIU CUP 
KMOU - KARX1+D0P 
Sfel 
LOOP 

ENDDO.TEST 
LOOP 

ggSDO SOIL 

SORT C W RF B ^D/p ,KnKBSt TO TQ*Ptf ARSORT . 
USE TO-ffTfcRSCKT 

♦REP LACE ALL START K2TO RF&©/XDCEWE*lOOO0 
COOtf? TO TOffOUJCO 

♦ CCMPRfcSSICN SUBROtTTZNB B 

? 'CCg gRESS DSS TARGET LIBRARY* 

VBE TEMPSUB 

SORT CN ENTflY, NUMBER TD'SUBSQRT 

USE SUBSO RT 

COONT TO &IH(iKTJ£ 

REPLACE ALL RTEND VZ7K 1 

HMOd c 1 

SW3-C 

DO WHILE SW2.0 ROLL 
IP KAKK1 >s SUBGEHE 
PACK • 

COUNT TO BUKZQUE 
SW2sI 
LOOP 
DZDZF 
00 KARXl • 

gf i 1 

STORE, EMIR? TO TESTA 
STORE D TO DESZGA 
CWcO , 

DO HK2X* SWbO TEST 
flJOP • 
STURB BJIW TO. TESTE 
STORE D TO DESXGB 
IT TESTA - TZSTS.AND.DSSXGArEESXGS 
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TTTTiKlS • . r ".' ' *•■ - 

DC? m WP+1 

GO WW 

REFLACB RFEHD WITH BOP 
MARK » tffJKItOT 

SWbXV- 

loop " - : : *■•■ * 

DQ300 TkST 

jjoop : 

B3BD0 KOIIi 

60KT> OS RF3D/D,NUKBER TO TDOSUBSOR? 
.♦USE 

♦REPLACE ALL FEART VffiCH WG©/HCD3E*lO0O0 
COOT? TO TSMPSUECO 



? 'S UBTRA CTIBG LIBRARIES' J " - * ■ 

USE g Uklft ACTXCN v _ ■*. ^'r** . •;■..*;.- 

COPy STOLkr l'Uius TO CRUNCHER - - v * 

flTTfPT 3 * ' 1 . . 

USE SSdPSUBSOKT 1 * 

CTTfTT.l ...... /v_ t „v = .. 

USB CRUNOfflR - . - . . . . c . 

APJEND FROM •EMPXARSORT _ 

COUNT ,TO*BMLOUT . ■■*>-. ' - ■ . o' 

2A20C s 0 . , 

SO KffTTff 
£^X£CT X 
NARX • IARK+1 
IT M ARK>BAILOPT 

ENDEXF 

F1K JPE ENTRY TO SCANNER 
SELECT 2 

LOCATE. FOR aTOOfeSCANNER 
IF FDONDO 
STORE RTEND TO BIT1 
STORE RFB© TO BXT2 
ELSE • 

STORE 1/2 TO BITl 
STORE 0 TO BITS 
ENDIF_ 

REPLACE BGFRSO WISH BIT2 
REPLACE ACTUAL WITH BZT1 
LOOP. 
BCOO 

REPLACE AH/ RATIO WITO KEEND/ACTOAL 

? •DOING FINAL SORT Sf RATIO 1 

SORT ON, RATIO/13, BGFRD0/D,DESCRX7IQR TO PINAL 

USE PINAL . 

■et talk off 

DO CASE. 

CASE PTF«0 ' 

SET DEVICE TO PRINT 

S ET jR PTT ON 

^JFl. T 

CAS E PTTsl 

SET ALTERNATE TO •Adenoid .Patent Figures: Subtraction. tact- 
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SET ALTXKNATE CN 



STORE VAX, {SYS (2))' TO PINT3ME 

if rp ?T iHR<S TAyn)g 

CTCRS , FIOTIMB+86400 TO J'HTTIKE 
HM7TP 

STORE FUdHB - ETARTIKZ .TO CGBSPSEC 
STORE COMPSEC/60 TO COMRCZH ^ 



-SET MARGIN TO 10 

81,1 BAY •Library Subtraction Analysis' STYLE €5536 FONT •Geneva',™ COLOR 0,0,0,-1,-1, 

* ■ , 

7" ... 
7 - 
7 date'C) 

7? » * . - 
77 TOfflO 

7 'Clone ambers ' 
' 77:£15l (OTTIATE, 5, 0) 
,7? ' through 1 * 
?? ETR(TERKHaTE, 6,0} 
T^iibraxiMi 1 
T^rargetl 
IP Target3o' 

77 », '■■ • - w 

77 TfcrpeW * 

INCUT =' 

IP 7ferget3o*. 

7? ', • ' - 
7? Target3 



7 •Subtracting; 

7 Objectl 

IF-Cfc>ject2«' 

?*-\. r 

77 Objects 

ENDZP 

IF Object3<>' 

7? \ ' 
7? Objects 
QjjQXF . 

7 'Designations r .* 

IF Bnatch»0 .AND. Hmateh=0 ,AKD. Creatch*0" .AND. IHATCH*0 

77 ^All 1 

BSXP 

IP Bnatch>l 
?? '&caet,* 
STOP 

ZF toatetul 
TTJHumaa, ' 
ENDC 

•IP Onatdtel 
7? 'Other ep. • 
8QXF 

IP Imatcb-1 
7? • •IMJXTi' 



•IP HOOtfl 

7 'Sorted by ABUNDANCE 1 

ENDXP. 

IP ANAL»3 

7 'Arranged FUNCTION' 
B3DXF 
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? 'Total clones r-sprusatedi ' 

?? CTR(TOT,5,0) 

? 1 Total -denes Analyzed* 1 

?? OTUSTARIOT,5,0) 

? 'Total, rarputation. tins: 

?? • iftinutas' * 

"■ •■J^ - ^2^^* ? ^ - **" r« function . . ^ ie , , . ^ 
•S 1 ."" 0 * 40.a a8 «,« a Paa6 .^.^ ^ e 

CASE ANAL-1 
/ :?7*» ccaes, f or a total of > - 

1 clone*' 

7 > ' . . 

SCREOJ 1 TYPE O'HKWUIC •Scraen 1" AT 40 2 crrr Vn* *tv- A _ 

CLOSE* DA3ABASES 

•USEVfin«rtC^:FccxBASE+/«ac:fax files : clones. <bf 

CASE.»AU2 
• • arrange/function 
SOT TJUNTCK 
SET HEADING ON 

. 1 ' W7B. 0 HEMmc Screen 1 V XT 40,3 SXZE »«.«, TOELS ■„ COLOR 0 

"J BINDING ROTSmS' 

.r^^crS 1 2c^ 1 , ^^> T <0 ' 2 " a *" «*« •Helv rt i«.. 2 65 COLOR 0 

SCREE* 1 TTPZ 0 HEADING 'Screen l 1 XT 40 2 err? ?bc ^o-, n^„, * ' 

SCREEN 1 OTOE 0 HEADING 'Screen !• AT 40 2'STZr 2Pfi ao-* btvw«. . * 

list GET fiel* ^.».r.2.H.E^s^^f|^f^^- G ^:3,^ °' M ' 

TvU^'ju^F?* l " » < 0 ' 2 imiM-m-IBM-'w' •H. 1 veti«., 2 «5 COLOR 0 

,6CRBEN 1 TYPE 0 tEADXNG "Screen 1' AT 40 2 ST2T sac 40-5 *-v»te — 

list OFT fi.l* ■^.W.MU^.Si,^ 0.0.0. 

!2Sii 1 Sgt;f! mB *' XT <0 > *** -Heavecic.^es col* o 

BFK ^ -^f^^^ 0,0,0. 

pHEN 1 T«E 0 HE*** . B cx«n ^ ^ .„ elv( * ica .. 268 ^ „ 

H^J^ct^ ,SCr ~ l " A T «< 2 «" roOT -.H el v Bt ica.. a « COLOR 0 

SCREEN 1 WPE 0. HEADING *Scr»«n V AT .40 2 fftzr am MVW( . . 

list OFT fltf* r^.D^^^TsfS^C^^fl^f^.^. i 4 ^:^. 00 ^ 0.0.0. 

l^aS;^ »' » «' 2 " 2£ <« «« *N«lvatiea*, a C5 COLOR 0 

SCREEN 1 WPE 0 KEACCENG a Scr««n 1" AT 40 2 ST7P oqc ao-* . ' 

li.t OF* fields »^.vWifft!U^!^ o;o,0. 
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SCRE37 1 T5TPE 0 HEADIHC -Screen 1- AT 40 2 RVrr *oc 

? 'Viral elereatsi* * T 40,2 612E 3B6 '«2 riXELS row -Helvetic* -.365 COLOR 0 

SCRSEK 1 TVPE 0 HEADING •Screen 1" AT 40 2 smv lae a*- m - 

?• 'Ricaee* and Phosphatases!* «,2 SIZE 386,49. FSGLS FCOT -Helvetica-, 2« COLOR 0 

SCHEQJ 1 "TYPE 0 HEADING "Screen 1* AT 40.2 ST?t 5u* /e- . 

• ^SZi^W^ * «' 3 « «M» . ^ .He^icav Jff5 COW* , 

-BCRQN 1 TYPE 0 HEADING 'Screen 1* AT 40 2 no* 

- bCRilN 1 TYPE 0 HEADING •Screen 1i if in ? ^ 
- SOffiQJ 1 WPS 0 HEMJ3NQ 'Screen 1* AT 40 2 ETZ2 ?b* i« 

CCREm 1 TYPE 0" HEADING •Screen 1» AT 40 2 st?v 9bc jc* ~* * 

. ? 'Translation: - AT « D ' a TO 2B MS2 PIXELS FONT -Helvetica- 265 COLOR o 

SCREEN 1 0YPE 0 HEADING. -Screen 1* AT 40 2 sitt i*c a*-> ' 

list off -fui*, ■f^^r.^iu W.fSc^^^. «J^:J.~ < f ' 0 ' 0 '' 

SCREEN 1 W?E 0 HEADING 'Screen 1* AT 40 2 wr ->*c a^ 

iut opt f ieWs ^.o;^^:^^^^ ^.^ o.o;-o; 

■kw saw 

.SCREEN 1 TYPE 0 HEADING •Screen 1' AT 40 2 sr?r ooc 

» ( • - 2 SI2C 286 ' 492 ^P 3 " ■ WW "Helvetica- ,268 COLOR 0 



SCREEN 1 TYPE 0 HEADING 'Screen 1« AT 40 2 cttt 3oc a**> 

li~ OFT field, *».F.I.IU^^Si^^^ 

SCREEN 1 TYPE 0 HEADING -Screen 1* it *n <i »~,«. 

f^^^SoSSS.S.f 6 . AT "' 2 6128 2B$ '« 2 «* .»*iv.tic... a65 core* o 

SCRHN 1 TYPE 0 HEADING -Scr»en 1- AT 40 2 sm 5qc i« - 

f^J SC 88 ' ,6c ^ en 11 * T "' 2 •» ^ F0OT .H.iv.tic... 2 « color o 

SCREEN 1 TVPB 0 HEADING 'Screen 1* AT 40 2 fitrr 5oc* j 0 -» ^ 

n,t OFT fid* n^ (D . K , 2 . R(E ^? s ^^ 0,0.0. 
fSi S LE,;^ 01 11 Ai "' 2 S1 « ««•« ™ n«T .H^tic...»s ecu* 0 

ECREQt 1 1YFE HEADIBC 'Screen »• AT 40.2 SJ2S 2B6 ,„ 2 ^ ^ .^.^ ^ „. Q 0 . 
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list GPP fields CX2ab€r # D,P,S,R f EWrar, S.I^CKIPro^^BCn^^FlF^^^RATJOrl FOR R.*M' 

fSEAf?M-!!5Ki3?~ x " " 40,3 OTE JM,<S2 ^ -*whsm1--96 a&* , 

SCREEN l.WPB 0 'BEADING 'fimeal' ASP 40,2 SIZE 286,492 PIXELS FWT ; Geneva-,7 ™T*P 0.0 ft' 
iiat.OFF fields »uraber,D,r,E.R,&W.6,OESCR^ roR r.«n«^^ 

r?Sid StLStS^ "^ CreCn ^ 40 ' 3 m *! MM ^ M;LS ' ^ 'Helvetica-, 265 cotCR 0 
SCREari TOPE 0 HEADS© -Screen 1- AT 40,2 SIZE 286,492 PIXELS PCNT -Geneva",? COLOR 0.0 0 
• list OTP fields amber,D,F,Z # R ( B7ntf,S,DESCRIF^ for rJw 

'PSbJ Z^f?*™® ,SCreCr *" " 40,2 * e «'<92 PIXELS PQJT -Helvetica', 265 COLOR 0 

f?°^Li Sf^J HEOTO-'Bwm 1- AT 40,2 SIZE 266,492 PIXELS PCNT * Geneva 7 COLOR 0,0,0 
list OFF fields nu^r,D,P,Z ( R,ENTO,6,p£CKPTC», n>R R*'E'^^ 

SCRESJ 1 TOE cfHEADTC* -Screen/1 ' AT^o J kZE 286,452 PIXELS PCNT -Helvetica-, 2 68 COLOR 0 
? 1 ' •* '" ! HTSCELIAMBODS OOTtWRXES' " 

^f^^a'rSSn0^ 1 ^ ******* " 4 °' 3 OT " MS * WXELS >Helv «tica-,265 COLOR 0 

SCREBT1 TYPE 0 'Screea 1- AT 40,2 SIZE 286,492 PIXELS PONT "Geneve-, 7 COLOR 0 0 0 

list OFF fields mmter,D,F;Z,R. Em?, S, DESCRIPTOR FOR fc'H'^^ 

f^ruccSrll:? " 8 I~ *" 4 °' 2 ™ * BMM " PIXELS ,H **vetica\365 CCLOR'O 

TPL 0 KE ^ ING ^•| c r ecn *' XT 40,2 SIZE 286,492 PIXELS POTT -Geneva'-, 7 COLOR 0.0 0 
list OFF fields. atnber,D,F,Z,K.a^,*,B2£^ °' 0 ' 0, 

cTctg^^ 2 ** 3 ' Scr *^ a " . AT 40:2 m 286,492 PIXELS PCNT • Helvetica -.3 65 COLOR -0 
SCREDJ1 TYPE 0 ffiMIO •Sereea I* AT 40.2 SIZE 286,492 PIXELS * PCNT* •Geneva*. 7. COLOR 0.0 0 
list OFF fields nm^r,D,P, 2,R,D?TKy r S,tJKCRlPTOR # BGPRfiO,RFBffi,RAiriO, I FOR JW'X'^^ 

-f^L^unlcn^^t^ *' * 40 ' 2 ™ 2B6 ' <52 PD3LS W -Helvetica-,265 COLOR 0 
SCREEKl WPS 0 HMIN3 -Screen 1- AT 40,2 SIZE 286,492 PIXELS KOT 'Geneva', 7 COLOR -0,0 0 
list OFF fields nur^,D,?,Z.R,a^,s,BESCIUro^^ TORR.^' ' 

BTCA6E 

DO -Teet print .prg- 

SIT PR INT O TP 

SET DEVICE TO SOIES? 

CLOSE DATABASES 

ERASE TEMFLTB.pBF 

ERASE TEHPNUH.CBF 

ERASE TSMPDESIG.ZSP 

SET XARGIN TO 0 

CLEAR 

LOOP 

£282)0 
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•Itertheta (aiagle) , version 11-25-54 
close databases 
SET 7!ALK OFF 
SET FJUOT OFT* 
SET DCACT OFF 

clboi - 

STOKE » to Ecfaject 

STORE 1 • , 

STOKE 0 TO KiraS. Dcbject 
- . t . eiQRE 0 -to Zog 
- STORE 2 TO Ball 
DO WHXXiS .T. 

.* Program, i Northern (single J . fmt 

Bate.*.. i 8/ fl/Sfl . . • 
• Version;, .PoxBWB+ncac/ r*vi.icn i.10 
, • llpt«.v« i -Format file Kcrthem Single) 

SCREEN 1 5VPE 0 HEADING *£er#en i» «» ' T ... _ 

b rams uo.aa get b58 VStS 1 S , SSS5 ?^f?Lf^ Jfi 3 i_ «w •o^CS^i'md 



•'BOP: Northern | Bangle) . fat 
READ 

17 Bailed 
CLEAR ■ 
icrcen 1 off 

ROTOR N 

AT ^ r ^ jF ° 5KBAS ^ /MaC:PQX 
I? Bo tpecto' 

STORE UPPER (Eobject) to Eebject 
SET SAFETY OFP , J 

SET SAFEIY CM . • ih *** m 

USB 'Lookup entry. dbf* 
LCCAIE FOR Look^object 

era* 

LOOP 

sss? 



SX0R2 Entry TO Searchval* 

GLOSS DATABASES 

ERASE .'Lookup entry, dhf- 

QJDIF 

•XP-Dobjeeto' • 
GST S CACT OFF 
SET SAFglY OFP 

to iPt ° r - ro ,Lee * a P*-c*iptor.ebf 
USE "Lookup desariptor.dbt* 

CLEM. ■ 
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LOOP 
BR0H9E 

STORE Bitiy TO Se&rcfavkl 

CLOSE DATABASES 

ERASE •Lookup descriptor .dbf 1 

SET EXACT OK 

BdF - 

IP NuwboO 

USE *GmartO^:ta£AC&t/Mactrax files : clones. dbf» 
GO ttmto 

.CTPHE' Entry TO Se&rcbv&l ~ 
' B83XF 

am 

^•Northern analyst! for entry * 
" Sear«taval 

* . / 

*-? •feter V to proceed^ • 
WJT TO OK • 
CLEAR 

IF UF?SUCK)o'y< 
screen 1 of! 
•H EIUK N 
BCffF ' 

..•<■" r* ■ . 

* CCW?RIS£ICW*SUBKOUTIKE FOR Lihrary.dbf 
7 •Ccspreaeing the Libraries file now;-..* 

USE , 6&artCuy:FtoxRASE4/Kac:Fox files : libraries. dbf 
SET SAFETV OW 

SORT ON library TO *CcBpxeeeed libraries. dbf • 

* FOR eate reo>0' 
SET SAFE3Y ON 

tZSE •Ccopreseed libraries-. dbf 

DELETE FOR entered 

PACK 

COUNT TO TOT 
MMUd - 1 
SW2»0 . 

00 NKZUE SW2»0 RQLX 
'IF HARIp. >» TOT 
FAOC . 
SW2-1 
LOOP 
ZHEXF 

GO H ARK1. 

' STORE library TO TESTX 
' SKIP . 

STO RE L ibr ary TO TCSTO 

IF TESTA * TESTB * 

DDZF 

Mwoa « warki+i 

LOOP ' 
D3DD0 ROLL 

* Northern analysis 
CU3A 

7 'Dole? the northern now. . , 
SET TALK ON 

USE , gagrtOqyiFoxaASE+/Kactrox files t clones. dbf*' 
SET EAFVn OPT 

COPy TO H ita.dbff' FOR eatryaeearchyal 
SET SAFETY OH 
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* MASTER ANALYSIS 3; VERSION 12-9-94 
' * Master? menu for .analysis output 
- CUSS DATABASES 

SET -TA LK O FF 

SET SAFETY OF? 
.CLEAR 

: SET. DEVICE TO SCREEN 
SET DEFAULT TO • SmartGuy : FoxBASE+ /Mac : fox files « Output procreasi- 
USE •SmartGuy:FoxaASE+./Mac:£cK. files :Clones.dbf- wogreasi 

' GO TOP 

s tore num ber to xnitiats • 

' QQ^b o | ™ m_ 

• STOR E -NUMBS* TO, TERMINATE 

-STORE 0 TO SATIRE 
! STOR E 0 TO CCmSN 

STORE 0 TO AN&L 

STORE 0 TO EMATCK' 

STOR E" 0 TO HMATCH . 
t STOR E -0 TO OMATCH 

STORE 0 TO IMATCE 

STORE 0 TO XMASCH 

STORE 0 TO FRINTQN 

STORE 0 TO PTF 

DO WHILE .*T. 

Program.: Master analysis. 6nt 

• Date....: 12/ 9/94 

• Version.: FoxBA5E«7Mac, revision 1.10 

• Notee.*..i Format file Master analysis 



SCREEN 1 TYPE 0 HEADING "Screen l - AT 40.2 £12- 2fi6 DTvrre - - 

fi PIXELS 39,255 TO 277'. 430 STYLE 26447 COLOR oTo.^SleoO^-l^ eeneva '' 9 TOWR "'0.0. 

6 nms 75,120 TO 178,241 STYLE .9871 COLOR. 0,6^i,: 2 5600 -1 -1 

i l^i ll'll S£ Oatput Menu- STYLE 65536 F^G^- 274 COLOR 0 0 1 1 1 

6FIXELS 153.126 GET CM&TCH sSSEi Is536 f3? -SiSS-'li Srae -2£ S£° l0g0U f '.P* i 5 ' 1 

I l^i %'£ 2 £? 'tot***:' STYLE 65536 MI?S^Pa!f35l S S S?5 ff-l^ 15 ' M 
9 PIXELS 63,54 SET PR1NTON STOLE 65536 FCNT -Chicaco- U sinroT-Mr t- ! J' ! 1 , . , 
6 PIXELS 171.126 GET toacch STYLE 6553 6 WW ^Chicloc 1 L S -8^^™.°^ 
8 PIXELS 252.146 GET initiate STYLE 0 FONT 'Geneva • f 12 SIZE 15^70 COL§r ST , ^ ZE 1 lS ; 65 00 
6 PIXELS 270.146 GET terminate STYLE 0 FONT S^.U £ 15 70 fflSl o o I ! 5* 1 , 
8 PIXELS 234.134 SAY -mei^ clones • STYLE 65536 FONT Gmeva 12^dSi D o" i i i 1 
Q pixels 270.125-SAY ■->• STYLE 65536 FOOT -Geneva -fl4 COLOrTo-1 0.0.-1.-1,-1,-1 
8 PIXELS 198.126 GST PTF STYLE 65536 FOOT 'Chfcaob' 12 PICTURE -@»C ^int' ll e,- ♦ » * 

6 PIXELS 189,0 TO 257.120 STYLE 3871 COLOR oToTT-aMOO -1-1 ilC SI2t 15,5 

6 PIXELS 209.8 SAY -Library election- STYLE 65536 FONT -Geneva- 266 color n a , , , , 
C PIXELS 227,18 GET ENTIRE STYLE 65536 FOOT -Chicago^ t^'%tmffi,i&££'£&\ t 



* 

* EOF: Master analysis. fat 
READ 

IF ANAL»9 

CLEAR 

CLOSE DATABASES 
ERASE TEMPMASTER.DBF 

USE • SmortCuy : FoxBASE+/Mec : fox files i clones. dbf- 

SET SAFETY ON 

SCREEN 1 OFF 

RETURN 

ENDTF 

7 hhtiat* 

7 TERMINATE 
ICONDEN 
7 ANAL 
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? snatch 
? -batch 

? Cmatch , 

SET TALK CM. , ■ * 

I? ENTIRE* 2 ' 
USE •Uhlque libraries ,'dbf ■ 
• REPLACE ALL i WITE • • 

BRO^ FIELDS i.lihnme, library. total t0ttt id AT 0,0 

^^i^^^ 4 ^^ 10 * files sclcnes.dhf- 

COPiT STRUCTURE TO TDZPLZB 
USE TD^PLIB 
IP DJTIRE-1 

AWD5D JttQM * £fcnartGuy : PcxSASE-f /Mac : fax files: Clones, dbf • 

IP EWTTRE&2 
USE ■Unique l ibrar ies .dbf* 

COPY TO SELE CTED .FOR UPPER(i).«y» 

USE SZLECTOD - - . 

STORE RSCCOUOTO TO STOPIT 

MARX«1 

DO WHILE .T. . . 

IF MARK>STOPIT 

CLEAR 

DdT 

DOI? 

USE EJECTED 
GO MARK 

STORE librery TO THISCNE 
? 'COPYING 1 
?? THISCNE 
USE TEKPLIB 

APPEND PROM , SmartCuy;FcxBASE*/Nac!fox files. ««» 

STORE TO MARK '«ac.xox riies.Cloae6.dbf FOR . lihrary-TOlCCNE 

LOO? 

SNDDO 

QTOIP 

^JL^^ilS^^^ : f °* files rclonea.dbf 
COUNT TO 8TARTOT 

OOP5T STRUCTURE TO TEMPJ3ESIG 

USE TOffDSSXG 

IF DnascheO •AND.. Hmatch=0 .AMD. toatcfc=0 .AND MLTCH=n 
APPED FROM TEMPLIB , * N3 ' IMATCH=0 

ENDIF 

IF E5WwCh*l 

APPQ© FROM TEMPLIB FOR D='E» 
Q3DXF 

IF Hmatchcl 

A PPEN D FROM TEMPLI3 FOR Ds'H 1 
DJDIF 

I F Om atchsl 

APFQCD FROM TEMPLIB FOR D»*0* 
I F Im atch»l 

APPBID FROM TOdPLIB FOR D=T .OR.Ds'X' ,OR.D»'N' 
IF Xnatchnl 

APPEND FROM TEMPLIB FOR D«'X» 

ENDIF 
COUNT TO ANAUTOT 
set calk off 



DO CASE 
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CAS E FTF«0 , . • - 

SET DEVICE TO PRINT" 
SET PRINT ON 
EJECT 
CASE PTFcl 

SET AL TERNA TE TO 'Total function eort.txt" 
•SET AWERNATE TO °H and 0 function sort.txr' 
•SET AI/TCRNATO TO "Shear Stress HUVEC 2:toidacc© sort.txf 
♦SET ALTERNATE TO •Shear Stress HUVEC 2iAbundance con. bet' ' 
*SET ALTERNATE TO 'Shear Stress HUVEC 2 -Function sort.txf 
♦SET ALTERNATE TO 'Shear Stress HUVEC 2 :Dastribution sort.txf 
♦SET ALTERNATE TO 'Shear stress HUVEC l:Clone list.txf 
•SET ALTERNATE TO 'Shear Stress HUVEC 2:Locaticn eort.txt' 
SET ALTERNATE OK 
BTOCASE -7 

IF FRINTONsl 

gU30 SAY -Database Subset Analysis' STSfLE 6:536 FOOT 'GanBva\274 COLOR 0,0,0,-1,-1,-1 

7 
7 

7 

? datc{) 

?? • ' - ■ . . ■ 

7? TIME { J 

7 1 Clone* numbers ■ 

.7? STR( INITIATE, 6,0) 

7? 1 through • 

7? 9TR (TERMINATE, 6,0) 

7 'Libraries; 1 

IF BSTIREsl 

7 'All libraries' 

ENDIF 

IF ENTIR£*2 
MARJU1 
DO WHILE .T. 
IF MARfOSTOPIT 
DOT 

ENDIF 

USE ft^y ^ * 
GO MARX 
7 1 1 

77 TRIH(libname) 
STORE MARK+1 TO MARX 
LOOP 
B>3DD0 
ENDIF 

7 'Designations: ' 

IF Bnatch=0 .AND. ttnatch=0 .AND. Qnatch=0 .AND. IMATCH-0 
77 'All 1 

ikdif 

IF Efoatch&l 
77 »E*aet, ' 
UQDIF 

IF Hroatch-1 

77 'Human, ' 

ENDIF * 

IF Onatchsl 

7? 'Other .ap.' 

Q©IF 

IF Imatch&l 
7? 'HCtSE* 
Q3DIF 

IF Xrcatchsl 
77 'EST 1 
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B4DXF 

-IF CONDSfel • 

? _* Condensed format analysis 1 
-0JOIF . _ 

IP ANALel- ' * - , - - ... 

? 'Sorted by NUMBER 1 ' *r- ... 

ENDZF - 

-IF;ANJUi=2 ' " * ■ 

7? 'Sorted by ENTRY 1 

• ENDXF 

• IF ANALs3 

? 'Arranged by ABUNDANCE 1 . • * . v .. 

..XF\ANAU4 

'So rted by INTEREST' ' 
' FNPTF 
I? ANAU5 

? ^ Arranged by LOCATION 1 
BTOF * 

2F.ANAL-6 ' ' " '- * ' ■ , . ....>-.- 

? JArrEnged by DISTRIBUTION' 

IP ANAL-7 • 

? 'Arranged by FUNCTION' 

? 'Total clones represented: 1 

?? STRtSTARTOT, 6# 0) 

? "'Total clones analyzed: ' 

?? STR(ANADIOT,6 f 0J 

? 

7 U B> library d * designation 1 « distribution 2 = location r * function c - cex 

USE TEMPDESIG * 
SCREE^l TYPE 0 HEADING "Screen 1- AT 40,2 SIZE 286,492 PIXELS FOOT -Geneva'*, 7 COLOR 0,0 r 0, 

CASE ANAI**1 

• eort/numbar 
SET HEADING ON 
IP CCNDENal 

SORT TO TEMPI ON ENTRY, NUMBER 
DO -COMPRESSION nirttoer.PRG' 



SORT TO TEMPI CN NUMBER 
USE TEMPI 

liet off fields number, L,D, F. 2, R,C# ENTRY, S, DESCRIPTOR 
CLOSE XtABASES* OUI ^' L ' D ' F ' 2 ' R '^ 
BVL55 TEMPI. DBF 
EMDIF 

CASE ANAU2 

* aort/DESCRIPTOR 

SET HEADING ON 

•SORT TO TEMPI ON DESCRIPTOR, ENTRY, NUMSER/S for D» 'E' .QR.Efe'K' .OR.Ds'O* OR D*'X» OR D»'T< 
•MOT TO TEMPI ON ENIW, DESCRIPTOR, NUMEER/ S for D*'I< OR £'H' A iftfrx' 'S'S'I' 

SORT TO TEMPI ON ENTRY, START/ S for D= 'E' .OR.Ds »K* .OR.Ds'O 1 -OR,D= 'X'ToR.D* ' I' * 
IP CONDQfel 

DO -COMPRESSION entry. PRC* 
USB 

USE TO4P1 

j£L£J£5" nur ^' L <D'F'Z'K'C,D7TRY,S.^ 
CifiSB DATABASES 
ERASE TEMPI. DBF 
EMDIF 
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i CASE ANAL»3 

* tort ty abundance 

-SET K2AD2KG ON . * . J 

"^T™^^ ^ Y '^ER for D« i r.OR.D s 'H\OR.I)=-O'.OR.Cx'X'.OR.0-*r' 
DO •CCWFHESSION abundance, prg • A ,wn,y " A 

/.CASE AKAL-4 

* sort/interest 
SET HEADING GN 
IF CONDEN«l 

SORT TO TEMPI ON FOTKY, NUMBER FOR I>0 
DO" 'COMPRESSION interest . PRO* 

else ' 

SORT W I/D.ENIBY .TO TEMPI FOR I>1 
USB TSKFl 

ERASE TEMPI. DBF 
INDIF 

CASE ANALbS 

* arrange/location 

SET HEAD2N3 CM ... 
STORE 4 TO AMPLIFIED 
7 'Nuclear: 1 

SORT ON EOTRY#NUMEER FIELDS RFE2D, KUKBER L D F 2 B n nmv e rre«rm^« » _ 

DO •Ccnpressicn location. prg" 
•ELSE 

DO "Normal subroutine 1* 
EKDIF 

? 'Cytoplasmic: 1 

DO ■Ccnpreasion location. pry* 
ELSE 

DO •Normal subroutine l* 
? 'Cytbskelecan: 1 

»«»™'>"m fields R n OT .y^.L, D .r, 2 .R,c ( ^y.s,r^cKi^R.i ra , INIT , I , C0 ^ 

DO * Compression location. pry" 
ELSE 

D O 'N ormal subroutine 1" 
QuDXF 

? 'Cell surface: ' 

DO *Conpr«9sion location. prg" 
HjSS 

DO*Normal subroutine 1* 
? 'Intracellular membrane: ' 

DO "Canpressicn location. prg" 

DO •Nornal subroutine 1* 
END1F 

? 'Mitochondrial; ' 

^RT^TCRr^NUMBER KELDS ^.W^.L.D.^^ 

DO •CcRpreacion location. prg" 
ELSE 

DO "Normal subroutine 1* 
ENDXF 
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? 'Secreted!* 

SORT CN IHTOY, NUMBER FIELDS RTB^^l^SZ^^L^D,?, Z.RX.nTTRY, £,DESCRIPTOK*LSCGTC, INTT, I.CCtfcffJJ 
IF.CGNDafcl 

DOVCcnpfeaaioa location.prg' 
ELSE 

DO •Normal subroutine I 1 

ENDIF - 

7 'Otheri' ' 

SORT ON D7HIY, NUMBER FIELDS RFEND,NUMBEB.L,D,F,Z,R,C, ENTRY, S, DESCRIPTOR, LOCK, INXT,I ( CCt**EN 
IF CCNDEN*! ■ ' 

DO 'Compression location. pro* 
pT»PF 

DO. •Normal subroutine l» 
ENDIF 

? 'Untoown: ' 

SORT ON DJTRY, NUMBER FIELDS RFEND, NUMBER, L,D,F,Z,R,C.ENm,S, DESCRIPTOR, L2*3TK, HOT, 1,006*13* 
IF CCNDEJJ=1 

DO ■ Compression location .prg 1 
ELSE 

DO "Normal subroutine 1' 

ENDIF. 

IF CCNTTQfel 

SET DEV ICE. TO PRINTER 

SETPRDTIER CN 

EJECT 

DO "Output heading. prg* 
USE • Analysis location. dbf" 
DO "Create bargraph. prg 1 
SET -HEADING OFF 

? • FUNCTIONAL CLASS TOTAL UNIQUE NEW h TOTAL 1 

o 

LIST OFF FIELDS Z .NAME, CLONES, GOTES, NEW. FERCENT, GRAPH 
CLOSE DATABASES 
ERASE TEMP2.DBP 
SET HEADING ON 

*USE "SmartGuy:FoxBASS*/Mac:fox files :TEKFMASTER. dbf 
ENDIF 

CASE ANAL- 6 

* arrange/distribution 

SET HEADING ON 

STORE 3 TO AMPLIFIER 

? 'Cell/tissue specif ic d istribution i * 

SORT CM DflTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F , Z, R, C. ENTRY, S. DESCRIPTOR. LETCTO, INIT,I,CO»(EN 
IF CCNDENsl 

DO "Compression discrib.prg" 

DO •Normal subroutine 1' 
ENDIF 

7 'Non-specific distribution: ■ 

SCOT CN INTOY, NUMBER FIELDS RFIND, NUMBER, L,D,F,Z,R,C. ENTRY, S, DESCRIPTOR, LENGTH, HOT. I ,CO»<Bf- 
IF C0NDEN*1 

DO "Caapression distrib.prg" 

DO •Normal subroutine 1" 
INDIF 

? 'Unknown distribution: ' 

SORT CN ENTRY, NUMBER FIELDS RFEND, NUMBER, L,D,F, 2, R,C,E^Y.S,DES(^PTOR,m«rrH,INIT,I,CXJMME« 
IP CCNDEN*! 

DO ■Cotpreesion distrib.prg* 
ELSE 

DO ■Noxztal subroutine l* 
ENDIF 

IF CCNDDJel 

SET DEVICE TO PRINTER 

SET PRINTER ON 
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EJECT • r - : - ' 1 ; ■ - 

DO -."Output heading. pre;* 
,j OSS * 'Analysis distribution. dbf 
: DO •Create barffreph.prs' 

* 'SET HEADING OFF 

-\ ? FUNCTIONAL CLASS <**m^ 

r ? •. >. . ..v.. TOTAL UNIQUE % TOTAL' 

ERASE T^M^i.DBF 
SET HEADING CN 

^•SmartGuy:FoxBASE + /Mac:fox fileismwwiSR.ft,. 

■j r * 

• CASE ANAL=7 

' *** arrange/function 

• SET HEADING ON 
STORE 10 TO AMPLIFIER 

y BINDING PROTEINS ' 

T^Surface molecules and receptors!' 

SORT ON DTTRy.NObBER FIELDS Ri^ND nhmppr t n v » » ~ - 

IF CONDENsl ^ 5 ' M ^' UD ' F ' 2 ' R ' C '^^^ 

DO •Caapression function.pro' 

ELSE 

DO 'Normal subroutine 1' 
D©IF ■ 

? 1 Calcium- binding proteins: • 

SORT ON ENTRY, NUMBER FIELDS RFEND *Jmmbpr t n * ^> 

^ CQNDDfsl • ^^'^ ro ' L<D ' r ' 2 ' R ' C ^ Y 'S # DESCRIPTOR,IJE^ 
DO 'Compression function .prg' 
- 'ELSE 

DO 'Normal subroutine l ■ 
QJDIF 

? 'Ligands and effectors t 1 

SORT ON ENTRY, NUMBER FIELDS RFDJD NUMBER r n • • t> > 

IF COTO-1 ^' NU ^' L '°' ? ' Z '*'^^^ 

DO •Ccnpression function. pro' 

ELSE 

DO 'Normal subroutine !• 
ENDIF 

? 'Other binding proteins:' 

SORT CN ENTRY/NUMBER FIELDS RFEND NUMBPft r r» » » * ^ 

IF OONDEN*! ^^^^' L ' D ' r ' 2 '*' C '^ Y ' S ' DI ^^ 
DO ' Centres i ion function .pro" 

DO ■Normal subroutine 1* 

QJDIF 

•EJECT 

9 * 

; ONCOGENES' 
? 'General oncogenes t 1 

tf^r'** 02 * P1ELDS ^^UD^u^mv,^™,^,^,^ 

DO •Ccnpression lunction.pro' 

DO 'Normal subroutine !• 
INDIF 

? 'GTP -binding proteins i * 

SORT ON INTRY, NUMBER FIELDS RFEND NIMHP5 r n p * t> « 

IF CONDOM ™ S *^' NU ^' L ' D ' r ' Z ' R ' C '^^ 

DO '•Compression function.pro' 

Et^E 

DO 'Normal subroutine i« 
B©IF 

? 'Viral elemental* 
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DO.TConpreaeion function. pre' 
£LSS 

D O 'N ormal subroutine l 1 
BJDXF ■ 

? .'Kinases .and Phosphatases: ' 

SORT ON QJTKY,NUM3ER FIELDS WDID. NUMBER L D « 2 ft r P\mov c to^dt^ . ~« 

DO '•Cbnpressicn function .prg» v : . 



DO "Normal subroutine I s 
ENDIF 

? 'Tumor-related antigens i' 

DO 'Compression function. pro* 
H-SE 

DO ■Normal subroutine 1* 

D3DZF 

•EJECT 

J ' FROTEIN SYNTHETIC MACHBSRY PROTEINS' 

L^S ,c 5 iptioxi ^cleic Acid-binding proteins:' 

DO 'Compression function. pre' 
ELSE 

D O 'N ormal subroutine 1" 
Q$DXF 

? 'Translation: 1 

DO ' Compression function. pro' 
ELSE 

DO 'Normal subroutine 1* 
SNDIT 

? 'Riboscoal proteins:' 

DO 'Compression f unction. pro" 
ELSE 

DO/Norael subroutine 1' 
ENDX7 

? 'Protein processing i ' 

DO 'Compression function .pre". 
ELSE 

DO 'Normal subroutine 1" 

2NDIF 

•EJECT 

I ' ENZYMES' 
? 

? ■Ferroproteinsi • 

S R ^£S' RY ' NUKBER F1ELDS ^' N ^' L ' D -^.R.C,BraY ( S,D^^ 

DO •Compression f unction. prc B 
USE 

DO 'Normal subroutine 1* 

? 'Proteases and inhibitors:' 

DO 'Compression function.pro' 
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do •Nontal subroutine 1' . 
7, 'O xidative phosphorylation: 1 

SORT ON EOT^Y.NUMEEH FI3LDS RFDJD, NUMBER, L.D F 2 R C rvrey c nrcraxtyrvw t ■ -,- 

DO: •Ccstpreasioa function. pro* 
ass , 

DO "Normal subroutine i« 
DDI?/ 

? 'Sugar 'jnetfiboliezni ' 

SORT ON EKI51Y,NuM32R FIELDS RFD©, NUMBER, T - D F 2 R C RTTPV c nrcroT^ 

IF' CONDOM. '" w ^'- #u **'*» R ' c ' H ' TO 'S # DESC3aPT0R l U^^ 

DO "Compression function.prg' 
ELSE 

DO 'Normal subroutine 1* 
DDXF - 

? 'Amino acid metabolism; * 

SORT ON DTOlY.NUlfiER FIELDS R?END,NUM3ER,L, D F 2 R C Etfrav e nrcruTTyf*-* 

IF CONDOM. w *^^ #t,u# *' z ' R ' c « s *w#S,DESCRITO 

DO 'Compression function. pre* 
ELSE 

DO'Jtarmal subroutine J' 

? •Nucleic acid metabolism; • 

SORT ON BTTRY, NUMBER FIELDS RFO*D, NUMBER. L. 13 >F 2 R r pwtov c TrcneTw^o *~ _ 

DO 'Compression function. prg" 
ELSE 

DO 'Normal subroutine 1* 
ENDIF 

? 'Lipid metabolism: ' 

SofflE"'™* ?IELDS ^' 1 ^' L ' D ' r ' Z ' R ' C <^.S'^^^^ 

DO "Compression function .pre' 

ELSE 

PO_ ^Normal subroutine !■ 
ENDTF 

? •Other enrymesij 

DO 'Compression function. pry' 
ELSE 

DO 'Normal subroutine 1" 

ENDIF 

•EJECT 

? 1 MISCELIANE0U6 CATEGORIES' 

7 'Stress 'response; ' 

S R iS^ RY ' ,WKESR FIELD£ ^'^ R ' L ' D ' r '2-^C.^.S.D ESffl IPI OT(I ^, 1HIT , 1(ai ^ 

DO 'Compression function. pro" 
ELSE 

DO 'Normal subroutine 1* 
ENDIF 

? 1 Structural t ' 

DO 'Compression function.prg' 
ELSE 

DO 'Normal subroutine 1" 
fiJDIF 

7 'Other clones!' 

SOWO^Y.NUMBER FIELDS RFD^,h^>ffi£R, L,D / ?,Z,R,C,QTOY,S;DE£CRJPTOK,I J ZICTH ( INIT, ^ 

DO •Compression function.prg' 
ELSE 
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DO 'Nonnai subroutine 1"* ;* "i; ^ - • . 
ZNDZF 

? 'Clones of - unknown functioni' 

SORT ON. NUMBER FIELDS RFEND.NLM3ER L D t» ? & r i^tov * t^**-^— 

IF CCNDE1W * 2 ' R ' C ' 2 ^*S,DESCRI^ 

DO •CenpxeeBion function .pry - 
ELSE 

DO ■Karnail ' subroutine 1' 
©DIP • 

IFCONDQfel - f * . 

EJECT - 

•SET DEVICE TO FRINJER 

•SET PRINT ON 

DO * Output heading .prg" 

USE* •Analysis function. dbf" 

DO .^Create bargreph.prg" 

SET HEADING OFF 
«** 

SCREEN 1 T*FE 0 HEADXN3 "Screen 1- AT 40,2 SIZE 2<>6,<92 PIXELS PONT -CtamMa COLOR 0.0,0 
? • 

? • ' FUNCTIONAL CLASS ~«v«« TOTAL TOTAL NEW DIST 

•? t ruwwaiuiwUrf LXA5S CLONES GENES GENES FUNCTIONAL CLASS 1 

*•* 

•LIST OFT FIELDS P, NAME, CLONES, G2NES, NEW, PERCENT GRAPH ctvpjimv 
LIST OFF FIELDS F,NAME. CLO^.m.^P^Sow^'^^ 
CLOSE DATABASES 
ERASE TEKP2.D6F 
SET HEADING ON 

•USE •SrrartGuy:PoxBASE+/Macifox files rTEMPMASTER dbf 

mniF * 

CASE ANALc8 

DO "Subgroup eumnary 3.prg" 
DJDCASE 

DO "Test print. pry" 

SET PRINT OFF 

SET DEVICE TO SCRffiN 

CLOSE DATABASES 

•ERASE IIHPLIB.DBF 

•ERASE OIMPNUM.DBF 

•ERASE TO ffiDBSI C.DBr 

* ERASE SELECTED, nap 

CLEAR 

LOOP 

cNDDO 
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USE 

OOONT TO. TO T 

REPLACE ALL KFEJJD WITH'l 
KARK1 = 1 
SW2-0 . 

DO WHILE SW2«=0 ROLL 
IF MARK1 >»-TOT 
PACK. 

COUNT TO 'UNIQUE 

COUOT TO NEKQSWLS FOR Ds'H* .OR.Dc'O* 

SW2»1 

LOOP 

•GO MARK1 , 
OOP « 1 ' 

STORE DOTY TO TESTA 
SW - 0 

DO WHILE SWsO TOST 

STORE ENISY TO TBSTB 

IF TESTA c TESTS 

DELETE 

DUP s DUPtI 

LOOP 

ENDXF 
GO MARK1. 

REPLACE RFEND WITH OTP 
MARX1 - HARJU-rDUP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
•CO TOP 

STORE Z TO LOC * 

USE •Analysis location. dbf ■ 

LOCATE FOR ^LOC 

REPLACE CLONES WITH TOT 

REPLACE GINES WITH UNICUE 

REPLACE NEW WITC NEWGENES 

USE TO1P1 

SORT ON RFQJD/D TO TSfP2 

USE TEMP2 

?? STR (UNIQUE. 5,0) 

?? 1 genee, for a total of • 

?? BTR<TOT,5#0) 

?? ' .clones 1 

? ' v Coincidence' 

list off fields number l RFniD,L,D#F,2 # R,C # 2^TOf»SiI)ESCRIPT0R l LD3Gra # INlT f I 

•SET PRIKT OFF 
CLOSE DATABASES 
ERASE 15MP1.BBF 
ERASE TCMP2.DBF 
USE TD4PDESIG 
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» COMPRESSION SDBRCOT2NS FOR ANALYSIS PROGRAMS 

USE.TEHP1- 

COUNT TO. TOT 

REPLACE ALL.KFEND WITH 1 

marxi * 1 

SW2»0 

DO WHILE 5W2sO ROLL 

I? M&KK1 >= TOT.. 

PACK ' 

COUNT TO UNIQUE 

6W2«1 

LOOP 

£NDIF 
GO HARKl 
CUP c 1 

STORE ENTRY TO TESTA 

ew - o 

DO WHILE SW«0 TEST 
SKIP 

STORE EOTRY TO TESTB 

IF TES TA ■ TXSTB 

DELETE 

BUP * 0OP+1 

fcOOP 
•INDIF 
GO MARK1 

REPLACE RFEND Willi CUP 

KttXl « MMtKl+OT? 

SWsl 

LOOP . 

ENDDO TEST 

LOOP 

33DD0 ROLL 
•BROWSE 

-* SET PRINTER ON 

SORT ON DATE TO TEMP2 

USE TEMP2 

?? STR (UNIQUE, 4, 0) 

?? ' genes, for a total of 1 

?? STR(TCT,4,0) 

7? ■» clones' 

? 

7 • V Coincidence 1 

COUOT TO Pi FOR I»4 

IF P4>0 

7 9TR<P4.3,0) 

7? 1 genes with priority s 4 (Secondary analysis:)' 

list off fields Hiffi^,RFB©,L,D,F,2,R",C,BraY,S,DSSCMPTO for 2-4 

? 

SNDIF 

COUNT TO F3 FOR I«3 

IF P3>0 

? STR(P3 # 3,0) 

7? • genes with priority > 3 (Full insert sequence:)' 

list off fields number. RF2©,L.D,F. 2, P., C,JNTRy,S, DESCRIPTOR, L3WTK.INIT for 1=3 
ENDIF 

C0OOT TO P2 FOR 1=2. 

IF P2>0 

7 STO(P2,3,0) 

77 1 genes with priority « 2 (Primary analysis cotplete:) 1 

list off fields number, RFDC^L, D,F,Z,R,C,DsTOY, 6, DESCRIPTOR, LaJGTH, INIT for I«2 
ENDIF 

COUNT TO PI FOR 1*1 
IF F1>0 
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? STR(P1,3,0) 
**-*SET FRINT .CET ; ; " ■*.-"*- 

TLOSE DATABASES : " l ' ; . 

"ERASE DBF 
' ERASE TO1P2. DBF 

*SE 'SmarcGuyiFcxBAS^/Mcc^ox files : clones, dbf- 
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• COMPRESSION SUBROUTINE ■ FOR ANALYSIS PROGRAMS 

USE -TS3P1 

CGUOT TO TOT 

REPUTE. ALL RTEND WITKl "- r * 

MARK1 * 1 

SW2«0 

DO WHILE SW2.0 ROLL 
IF MARK1 >s TOT 
PACK 

COUOT TO UNIQUE . . 

EW2-1 ' * 

LOOP 

GO MARK1 
POP c 1 

STORE ENTRY TO TESTA 
SW - 0 

DO WHILE SW-0 TEST 
SKIP - 

STORE D7TRY TO TESTB 

IF TESTA • TESTE 

EELE3E 

EOT * EOT-fl 

LOOP 

QJDXF 
GO HARK1 

REPLACE RFEND WITH DUP 
MARK1 e HAHX1+SUF 
SW=1 
LOOP 

QtDDO TEST 
LOOP 

D3DD0 ROLL 
•BROWSE 

•SET PRINTER ON 

SORT ON NUMBER TO TEMF2 

USE TD4P2 

?? STR (UNIQUE, 4,0) 

?? 1 genes, for e total of 1 

?? STO<TOT,5,0) 

?? • clones 1 

*/ V Coincidence' 

list off fields a^er,HF3©,L.D.F,Z,R,C.;Q^ 

♦SCT PRINT OFF 
CLOSE DATABASES 
ERASE TEMPI .DBF 
ERASE TMP2.DBF 

USE •SmartGyy:FoxBASEt/Mac:fox files : clones . dbf ■ 
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COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USE .TEMPI ^ 
COUNT, TO TOT 
REFIAGE ALL RFQ© WITH 1 
HNWl ■ 1 
SW2«0 

CO WHILE SW2»0 ROLL * 

IP HARK1 >• TOT • : * * ? r 

, ./PACK 

COUNT. TO UNIQUE 

COUNT TO NEMSENES FOR D^'H' .OR.Da'O' 

SW2*1 

LOOP 

endif 

00 HARK1 
CUP - 1 

STORE QfTRy TO TESTA 
SW • 0 

DO WHILE SW=0 TEST 



STORE B7TRY TO TCST3 

IF TESTA = raSTB 

DELESE 

DUP = DUP+1 

LOOP 

ENDIF 
GO'MARKT 

REPLACE RFEUD WITO OOP 
MAKK1 - WtfKl+DUP 
SW-1 

LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
GO TOP 

STORE R TO FUNC 
USE "Analysis function. dbf 
LOCATE FOR P»FUNC 
REPLACE CLONES WITH TOT 
REPLACE GENES WITH UNIQUE 
REPLACE NEK WITO NEW5Q3ES. 
USE UMP1 

SORT CN RFESVD TO TEMP2 

USE TEMP2 

SET HEADING ON 

?? STR (UNIQUE, 5,0) 

?? 1 genes, far a total of 1 

?? STR(TOT,5,0) 

?? 1 clGnec' 



? ' . V Coincidence' 

list off fields nurriber,RFT*© # L,D,F,Z,R,C.ora^^ 



♦SET PRINT OFF 
CLOSE DATABASES 
ERASE TQ^Fl .DBF 
ERASE TDtf2.DBF 
USE TEMPDESIC 
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• coipkession subroutine: for analysis programs 

USB TB1P1 . - 
.O0UOT : TO -TOT 

REPLACE ALL RFQ© WITH 1 
:KARK1 r 1 

:C0 VJHILE SW2«0 ROLL 
IF MARK1 >* TOT 

• PACK 

• COUNT TO UNIQUE 
' SW2«1 

LOOP 

HJD1P 

GO MARK! - : 
PUP « 1 

STORE E39TRY TO TESTA 

SW « 0 * - •. : . 

DO WHH£ SWcO ' TEST ;. ■ 

SKIP ' _ ' .'*.*" 

STORE TO, ISS TO "'•-'*. 

XF'TESIA » TESTS 

riETCTB 

DOT b DUP+1 

LOOP 

00 MARK! 

REPLACE RFD*D WITH DUP 
KARK1 « IARK1*X3UP 
SW-1 
LOOP 

OSDDO TEST 
LOOP 

QCDDO ROLL 
GOTO? 

STORE P TO DIST 

USE •Analysis distribution. dbf * 
LOCATE FOR DIST 
REPLACE CLONES WITH TOT 
REPLACE GENES WITH UNIQUE 
USE TEMPI 

eort on rfend/d to TE34P2 

USE TEHP2 

?? STR(UNIOUE,5,0) 

11 ' genes, for a total of 1 

11 STR<TOT>5,0) 

11 ' clones* 

^ V Coincidence- 

last off fields nuwber,RFE5$D,L, D,F, £*R#C,QTraY,S, DESCRIPTOR, LENGTH, INIT, I 

•SET PRINT OFF 
CLOSE DATABASES 
ERASE TQ4P1.DBF 
.ERASE T2MP2.DBF 
USE TEKPZ2ESIQ 
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♦ COMPRESSION SUBR0OT2NE FOR ANALYSIS FROG RAMS 
USE TEMPI rnuawHS 

COUNT TO TOT 

REPLACE ALL Rm© WITH 1 

KARRI - I 

SW3-0 

DO VBOLE SW2-0 ROLL 
IF MARK1 >• TOT 
PACK 

; COUNT TO UNIQUE 

swa-i . * 

LOOP 

ZWDIP- 
GO MARKl 
DUP ■ 1 

STORE D3TKY TO TESTA 

sw • o 

DO WHILE SVfcO TEST 
SKIP 

STORE DMITRY TO TESTE 

IF TES TA * 1XSIB 

DBL glE 

DUP .e XXJP+1 

LOOP 

QDIF 
GO MARKl 

REPLACE -RFEND WITH CUP 
MARKl • MASK1+OJP 

LOOP 

54X0 TEST 
LOOP 

Q2DD0 RQIi ' 

GO TO? 

USB TEMPI 

?? STO (UNIQUE, 5,0) 

?? 1 genes, for a total of • 

?? STO(TOX,5,0) 

?? 1 clones 1 

7 ' V Coincidence* 

liet Off fields »^,Rro f L,D,F f Z f ^ 

•SET PR2OT OTP 
CLOSE DATABASES 
ERASE TEMP I, DBF 
USE TEMPDESIG 
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* CC*^SICN SUBROOTDC FOR ANALYSIS PROGRAMS 

copy to raci for 

USB ■ ' 

COUNT TO IDGENE POR D»'E' .OR.Ds'O 1 .OR D=»v» on n-. M , ™ ^ 

gs.™ :K;:S:£.s::K:s:s:{:. aj . T 

COUNT TO TOT 

REPLACE A-L WITH 1 

Mwua > 1 

SH2sO 

DO WHILE SW2-0 ROLL 
IP MWUC1 >c TOT 
PACK 

COUNT TO UNIQUE 

SW2U1 

LOOP 

BtDIF 
GO MAK3Q 
DUP • 1 

STORE D5TRY TO TESTA 
SW « 0 

DO W HILE SW»0 TEST 
SKIP 

STORE EWTRY TO TESTB 

If tXSTX s TESTS 

DELETE" 

DUP • 

LOOP 

S9DXF 
GO KARJU 

REPLACE RFDID WITH DOP 
MARX1 « MAR2C1+DDP 
SWbI 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
*EROWSE 

*SET PRINTER CN 

SORT ON RFETO/D, NUMBER TO TEHP2 
USE 1TXP2 

REPLACE ALL START WITO RFEND/H)GEHE»10000 

?? ■ acnes, for a total of • 
?? Sra(TOT,5 # 0J 
' clone* ' 

? * evidence V v Clonea/lOOOO' 

set heading off 

SCREEN 1 TYPE 0 HEADING 'Screen V AT 40 2 eize 2fifi a« OTV - ^ 

CLOSE DATABASES 
ERASE TEMPI. DBF 
ERASE TE3£72.I3BF 

USE • SmartGuy : FoxBASEt /Mac : fox files: clones. dbf- 
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•OPPRESSION SUBR0OITKE FOR ANALYSIS PROGRAMS 
USE TEHP1 

■ COUNT TO IDCENE FOR D*'S' .OR.Dr'O* ,OR,D* 'H' .OR.Ds'N 1 OR Ds'R' or 0=.*. 
PftEK^* ^ C *^*^ = "OR.Dr 'A* .OR,D= 'U 1 .Or!d= 'S 1 I^r!^« 'M' !qr!d&*R' .OR.D» 'V 

CCUOT-TO TOT : •* ^ *»■'■..■• ■ .. M 

REPLACE ALL RFEND WITH 1 ' < x - * 

• SW2«0 

DO WHILE SW2=0 ROLL 
ZF MARK1 >«= TOT 
PACK 

COUNT TO UNIQUE 

SW2&1 

LOOP 

EXDXF 
GO MARJQ 
DUP - 1 

STORE DJTRY TO TESTA 
SW ■ 0 

DO WHILE SW=0 TEST 
SKIP 

STORE E7TRY TO TESTS 

IF TES TA = TESTS 

DELETE 

DUP - DUP+1 

LOOP * 

ENDIF 
GO MARX1 

REPLACE RFD3D WITH DUP 
MARK1 - HARXItDUP 
SW«1 
LOOP 

ENDDO TEST 
LOOP 

EKDDO RQIi 
♦BROWSE 

♦SET PRINTER ON 

SORT CM RFDJD/D, NUMBER TO TD1P2 
USE TD4P2 

REPLACE ALL START WITH RFDTO/IDCD4E* 1 0000 

?? STO (UNIQUE, 5,0) 

?? ' gent», for a total of 1 

?? STR(TOT,5,0) 

7? ' clones' 

? 1 Coincidence V v Clones/20000 1 

e et he ading off 

SCREQ* 1 TYPE 0 HEADING •Screen 1* AT 4C 2 sizs 2Rfi dtyh c — 

^e^'™'^^Vic.SS«^i^ CeneVB '' 7 ^ °' 0 ' 0 ' 

CLOSE DATABASES 
ERASE TEMPI. DBF 
ERASE TBC2.DBF 

USE •SmartCuyiFoxEASEf/Macifox file* t clones. dbf 
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USE TEMPI 
COUNT TO TOT 
?? ' Wtal of 
??STR(TOT,4,0) 
77 * clones 4 
7 

ERASE ■ rag i: DBF "•**"**-.;. 
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•Lifeecan'menu,\ version- 8-7-94 - : ^ n 

SET TALK OFF 

cet .device to screen 

CLEAR/:: 

USE 'finartCiy:FoxBASE+/Mac:fox files: clones. dbf 
gTORB_LU?DAT£ ( ) TO Update 
GO BCflTCM 

STORE REGNO 0 TO doneno 
STORE 6 TO Qiooser 
DO WHILE 

* Program, s Li't^seq nenu.fint 
•Dote.*.. t 1/11/95 

.• Version.: FoxEASE+/Mac, revision 1.10 

* Notes. . . . : Format file Lifeseq menu 



SCREEN 1 TYPE 0 KEADIN3 •Screen 1 B AT 40.2 SIZE 286,492 PIXELS FONT "Geneva- Sfifi onrr« n n 

« PIXELS 18,126 TO 77,365 STYLE 2B479 COLOR 32767, -2S600; •^622^167^i 5 4il 8 C0L0R °' 0 ' 

0 PIXELS 110,29 TO 188,217 STYI2 3871 COLOR 0,0,-1 ,-25600, -L*l ! ' 15725 

0 PIXELS 45,161 SAY "LircSEQ ,i STYLE 63536 F0CTT •Ceneva',536 COLOR 0.0 -1 -1 7135 CflBd 

0 PCm 36;269 SAY STYLE =65536 PONT -Geneve'^ COLOR oX-l? 1° 7135 5864 

0 PI3E£ 63,143 SAY 'Molecular Biology Desktop- STYLE 65536 FCNT -Helve-^ca- lfi CQI^R 0 o o 

6 PIXELS 117,270 GET Chooser STYLE 65536 FONT 'Chicago*, 12 PICTURE «0*RV frv«n*^«*- — 
0 PIXELS 135,128 SAY Update STYLE 0 PONT *Geneva^l2 SI2E 15?79^XIXJR oV^ISF f^ 11 ?* 

• PIXELS 171,128 SAY cloneno STYLE 0 TOOT 'Geneve-,12 SIZE 15,79^cSr 6 0 6 -!I2So~^l 
9 PIXELS 135,44 SAY -Last update:- STYLE 65536 -(fene^Ttt COLOR 0 6 ll'I? 5 ll f 

0 PIXELS 45,296 SAY 'vl.30- STYLE 65536 FCOT •Geneva', 762 COLOR 0, o!-?l, -1,-1,-1 

* EOF: Lifeseg roenu.fmt 
READ 

DO CASE 

CASE Chocsercl 

5?J^ rtGuy, ! >cx2AS£+/MaclfcDC files :Oatput programs (Master analysis 3 .pre* 
CASE Chooeer«2 * 

5?J!^ rcGw, !' 0?l3ASE * /MaC!fox ^lee: Output programs : Subtraction 2.prg* 
'CASE Chooser^ ^* 

^^^^J 03 * 3 ^*^'^ £ilcSs0ut P ut Pxcgrams northern (eingle) .prg' 

USE •Librariec.dbf • 

BROWSE . 

CASE Choo8erie5 

5?J^^? rtGlv, f oxEASE * /Mac:fox fileszOutput programsisee individual clane.nro* 
CASE Cfcoocer»6 ^ 

DO • SznartGuy : FaxBASE* /Mac : fox files lUbraxiesi Output programs :Menu. pre' 

CASE Chooserc7 

CLEAR 

SCREEN 1 OFF 

RETURN 

EMDCASB 

LOOP 
DfflDO 
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61,30 SAY 'Database Subset Analysis' STOE C5536 FONT -Gcneva',274 COLOR 0,0,0, -1,-1,-1 

7 dateO * 
7? • 

7? TD4BO 

? 'Clone nunibers ' 
?? 6TR I INITIATE, 6,0) 

??- • through ^ - 

?? STR (TERMINATE, 6,0) 
7 'L ibra ries i ' 
IP IN7IRE«1 * 
?*'A11 libraries' 
Q2DIF 

IT ENnH£=2 

' MMOCnl : . ■ 
DO WHILE .T. 
IF KfcRK>STO?JT 
OST 

IMDIF 

USE SELECTED 
GO MARK 
7 • ' 

7? TOIM(libnen») 
STORE MAR3U1 TO KARK 
LOOP 
EttJDO 
D©IF 

? 'Designations t • 

ZF DaatchcO .AND. Hraatch=0 .AMD. CttatchsO 

?? 'All* 

INDIF 

IF Enatch«l 
7? 'Exact, ' 
QOIF 

IF Hmatch=l 
?? 'Hunan, 
ZNDZF 

IF. ©match** 1 
77 'Other sp. ' 
PTOF 

IF CCNEEK«1 

? 'Condensed format analysis' 

ENDIF 

IF ANAU1 

7* 'Sorted by NUMBER' 

ENDIF 

IF ANAI*2 

? 'Sorted by INTRY' 

ENDTF 

IF ANALe3 

? 'Arranged by ABUNDANCE 1 

ENDIF 

IF ANAU4 

? 'Sorted by INTEREST' 

ENDIF 

IF ANAL*5 

? 'Arranged by LOCATION 1 

ENQIF 

IF AMAL*5 

7 'Arranged by DlSTRlBinTGN 1 
IF ANAL»7 

? 'Arranged by FUNCTION 1 
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?.:• Total clones represented: 
? 'Total clonea analyzed! 1 

??._sra<w&LTOr,6,0) 

7." . * 

?■ * \ 
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DSE TBffl 

CDOOT TO TOT 
?? 1 TDtAl of 
??.STO(IOT,4,0) 
?? J clones 1 • 

riOSE DATABASES 
ERASE .JOffl. DBF . 
USE IEMPDESIG 
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USE TD4P1 
COUNT TO TOT 

7? • Total of w ' 1 

?? 1 clcoes! 

? • . ' ' - : ■ ; • - n : ; 

•list off fiflds oriber,L,D,F,2 # R f c,E^Y,DSS^ 
list off fields nunber # L f D # F f z,a f c,-wnsy. ( nESCR2noR 

CLOSE OftlABASSS 
ERASE TBm.OBP - J 
USE TEMPOESIG 
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•Northern (single), version 11-25-94 
close databases 
SET TALK OFF 
SET PRINT 057 

SET EXACT '6??' .-3 . , 

CLEAR 

S'iUR E 1 1 TO Ebbject 

STORE ' : ' • TO Dobject 

STORE 0 TO Numb 
STORE 0 TO Zog 
STORE 1 TO Bail 
00 WHILfi ,r. 

* Program. : Northern (single). 6nt 

* Date....: 8/ 8/94 " . 

* Version.: FoxBASWMao, revision 1.10 

* Notes : Format file Northern {single) 



SCREEN' 1 WPE 0 HEADING 'Screen !• AT 40,2 SIZE 286,492 FDELS FONT •Geneva- 12 com* ft a ft 

I l^i 11% £ fS; 3 ?^*™* 28447 "« o.o.-a.-2s«o!-E-i ,12 

9 PIXELS 69,79 TO 192,422 STYLE 28447 COLOR 0,0,0,-25600 -1 -1 

l ^^i ^c'??!^"?^*-'' STn£ 65536 m 'Geneva M2 COLOR 0,0,0, -l.-l.-x 

9 PIXELS 115.173 GET Eobject STYLE 0 Far? "Geneva M2 SIZE 15.142 COLOR 0 0 0 -l -l -l 

1 ?ff'??, 9 ^'^ ript iss: ^ 65536 w» ^iS'S coiS^fo?o?:i;:i;:i' 1 

e PIXELS 145,173 GET Debject STYLE 0 FONT •Geneva", 12 SIZE 15,241 CQLoi 0 o oil i 

6 PIXELS 35,89 SAr -Single Korthem search screen' CTYLE65536 FOOT^tei^i^i'^^ 0 0 

PIXELS 80,152 SAy -Enter any ONE of the f ollo^: 2 !^ ^siiTrok -S^^-Iu COLOR -1, 



* EOF: Northern (single) .fntt 
REM 

IF Bail«2 
CLEAR 

scre en 1 off 

KHTORN 

ENDIF 

USE p SrcartGuy:FoxBASE*/Mac:Fox files : Lookup .dbf" 
SET TfcLK'CN 

IF E ob j ecto* . # 

STOKE UPPER (Eobject) to Eobject 

SETT S AFETY OFF 

SORT C N S itry'TO 'Lookup entry. dbf g 

EST SAFETY ON 

USE "Lookup entry, dbf 

LOCATE FOR Look»Eobject 

IF .N0T.F0UNDO 

CLEAR 

LOOP 

IHDIF 

BROWSE 

STORE &try TO Seorchval 

CLOSE DATABASES 

ERASE "Lookup • entry .dbf ' 

ENDIP 

IF Dobjecto 1 • 
SET E XACT OFF 
SET SAFETY OCT 

SORT ON descriptor TO "Lookup descriptor. dbf • 
SET SAFETY On 

USE "LocOaip descriptor. dbf' 

lOCATE FOR UPPER (TRIM (descriptor ))=UFPER(raiM (Dobject)) 

IF .N0T.FCUNDO 

CLEAR 
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loop » * « ■ - ~ - • 

DfDIF 
BROWSE 

STORE Entry TO Searchval 

CLOSE DATABASES ' ' 

ERASE "Lookup oeecrdlptar.dbf*. 

SET EXACT CN 

ENDIF 

IF HiznboO 

USE • SnartGuy : FoxBASE* /Wac : Fax files relcmes.dbf a 

00 Wurab 

BROWSE 

STORE aitry TO Searchval * 
B9OTF 

CLEAR 

? 'Northern analysis for entry 1 
?? Searchval 

? .... 
? *&ter y to proceed 1 
WAIT TO OK 
CLEAR 

IP UPPER (OK) o'V 
screen 1 off 
RETURN 

a©iF 

* COMPRESSION SUBROUTINE FOR Library, dbf 
? 'ConpreasiJiff the Libraries file now. , . • 

USE • SmartCuy : FcxBASE* /Hac i Fox t lies : librari e c . db* ' 
SET SAFETY OFF 

SORT CN library TO 'Ccnpreeeed libraries, dbf 

* FOR entered>0 
SET SAFETY ON 

USE •Ccnipressed libraries .dbf* 

DELETE FOR entered- 0 

PACK 

CODWT TO TOT" 
MARK1 n 1 
SW2»0 

DO WHILE SW2«0 ROLL 

IF MARK1 >u TOT 

PACK 

SW3-1 

LOOP 

ENDI? 
GO MARK1 

STORE library TO TESTA 
SKIP 

STORE Library TO TESTE 
IF TESTA * TESTB 
DELETE 

MARK1 • MARK1+1 
LOOP 

BtDDO ROLL 



* Northern analysis 
CLEAR 

? 'Doinc th« northern now. . . ' 
SET TALK ON 

USB •SanartGuy:FQxSASE*/Mae:Fox files: clones, dbf ■ 
SET SAFETY OFF 

COPiT TO • Hits. dbf • for entry- searchval 
SET SAFETY ON 
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J ! . * 



CLOSE DATABASES 
SELECT 1 * ■ 

USE •Conpressed libraries. dbf» 
STORE FSOOaJWTO TO BltriM 

USE •Hit»;dbf - * ■.■•**■*■..- : , . . 

DO WHILE ;T: ' ' ~ - ' . 

CgT TT* p ^ 

IF MwrJoEntries r "* ' -'• 

EXIT 

ENDXF -- r 
GO MARK 

STORE library TO Jiggwr 

SELECT 2 ' 

COUNT TO 2O0 FOR lihraaryeJigyer" 

SELECT 1 . ^ 

REPLACE hitfi with 2ofif 

Mar)uMark+l 

LOOP 

EMDDO ' 

SELECT 1 

BROWSE FIELDS LIBRARY, LIENAME, ENTERED, KITS AT 0,0 
CLEAR 

? 'toter Y to print:' 

WAIT TO FRINSET 

IF ITOERtPRINSED.^y' 

SET PRINT ON 

CLEAR 

EJECT* 

SCREEN 1 TYPE 0 HEADING -Screen I* AT 40,2 SIZE 286,492 PIXELS FCMT - GenevaM4 COT OR 0,0,0 

? 'DATABASE ENTRIES MATCHING EtflTO • 

?? Searchv&l 

? DATEO 

? 

SCREEN 1 TYRE 0 HEADING "Screen !• AT 40,'2 SIZE 266.492 PIXELS FONT 'Geneva" ,7 COLOR 0,0,0, 

LIST OFF FIELDS library, libnarae, entered, hit» 

? 

? 

LIST OFF FIELDS NUMBER, LIBRAKY,D.S,F, Z, R,DJTRY, DESCRIPTOR ,R?STAFJ, START, RFEOT 
SET TALK OFF 
SET PRINT OFF 
S2DIF 

CLOSE DATABASES. 
SET TALK OFF 
CLEAR 

DO 'Te st print .pry' 
RETURN 
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TABLE 6 



libname . 
Inftameo adenoid 
Adrenal gland (rj 
Adrenal gland (T) 
AML blast talis (T) 
Bon© marrow 
Bone marrow (T) 
Cardiac muscle (T) 
CHAONOTO Chtrv hamster ©vary 
OORNNOTD1 Corneal stroma 
Fibrobtaai, ATS 
Fibroblast. AT 30 
Fibroblast AT 
Fibroblast uv 5 
Fibroblast, uv 30 
Fibroblast 
Fibroblast, normal 
HMC1NOTD1 Mail eeD line HMCl 
HuVELPBOl HUVEC IFN.TNF.LPS 
HUVEC conrro) 
HUVEC shear stress 
Hypotnatomus 
Kidney r0 
Liver 0) 
Lung (T) 

SkeUiaJ muscle (7) 
CMduci 

Pencreas, normal 
Pituitary <r) 
PlluJtory (T) 
Plaeorua 

6maU inxeetine (X) 
Epleerwirver. fetoJ 
Spleen (T) 
Stomach 
Rheum, synovium 
T -f B rymphobtast 
Testis <T) 
THP-1 control 
THP phorboi 
THP-1 phorbol UPS 
U937, monoeyilc leuk 



library 

A0ENINB01 
AORENOR01 

ADRENOTO1 
AMLBNOTD1 
6MARN0TD1 
BMAANOTQ2 
CAR0KOTD1 



FBRAGTD1 
FIBRAGTQZ 
REnAMTDI 
FBfiNGTDI 
F13RNQTO2 
FTEANOT01 
PBSNOTO2 



HUVENOB01 
HUVESTB01 
HYPONOB01 
KIDNNOT01 
UVRNOmol 
LUNGNOTD1 
MUSCNOT01 
OWDNOBDl 
PANCNOT01 
PiTUNOflOl 
PrPJNOTDl 
PLACNOB01 
SIKTNOTD2 
SPINFETD1 
SPLNNOTD2 
STOMNOTD1 
6VNORAB01 
TBLYNOTO 
TESTNOT01 
TKP1NOB01 
TXP1PEB01 
TVCP1PLB01 
UB37NOT01 



number library 

2304 UB37WOT01 

3240 HMC1NOT01 

3269 HMC1NOT01 

46S3 HMC1NOT01 

89E9 HMC1NOT01 

S139 HMC1NOT01 



di I I f entry 
EHCCT HUME FIB 
E H C C T HUMEFtB 
EHCCT HUMEFlG 
EHCCT HUMEFlB 
EHCCT HUMEF1B 
EHCCT HUMEF1B 



descriptor 
Elongation Uctor 1-beU 
Elongation I a dor 1-bgti 
Etonpetbn Uctor 1-bete 
Elongation Uctor i*bete 
Elongation Uctor i*t*te 
Elongation rector t-bete 



rfetarieieri rfend 

0 773 

0 370 773 

0 371 773 

0 470 773 

0 327 773 

0 375 773 
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WHAT IS CI.AT MED IB: 

1. A method^! analyzing a specimen. containing gene 
transcripts, said method comprising the steps of: 

(a) producing a library of -biological sequences; 
5 (b) generating a set of transcript sequences, where 

each of the transcript sequences in said set is indicative 
of : a; differenty one of the biological sequences of the 
library; j: v. .*:.'," fc 

.(c) processing the transcript sequences in a 1 : 

10 programmed computer in which a database of 'reference 

transcript sequences indicative of reference biological : 
sequences is stored, to generate an identified sequence ; 
value for each' of the transcript sequences, where; each said 
identified sequence value is indicative of a sequence 
15 annotation arid' a degree of match between one of. the 

transcript sequences and at least one of the reference F 
transcript sequences; and 

(d) processing each -said identified sequence value to 
generate final data values indicative of a number of times 
each identified sequence value is present in the library. 

2. The method of claim i, wherein step. (a) includes 
the steps of: 

obtaining a mixture of mRNA; : 
making cDNA copies of the mRNA; 
isolating a representative 1 population of clones 
transf ected with the cDNA and producing therefrom the 
library of biological sequences. 

3. The method of claim l, wherein the biological 
sequences are cDNA sequences. 

4. The method of claim 1, wherein the biological 
sequences are RNA sequences. 

5. The method of claim l, wherein the biological 
sequences are protein sequences. 
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6. The method of claim l, wherein a first value of 
said degree of match is indicative of an exact match, and a 
second value of said degree of match is indicative of a 
non-exact match. 

5 7. A method of comparing two specimens containing 

gene transcripts, said method comprising: 

(a) analyzing a first specimen according to the 
method of claim 1; 

(b) producing a second library of biological 
10 sequences; 

(c) generating a second set of transcript sequences, 
where each of the transcript sequences in said second set 
is indicative of a different one of the biological 
sequences of the second library;. 

15 " (d > processing the second set of' transcript sequences 

in said programmed computer to generate a second set of 
identified sequence values known as further identified 
sequence values, where each of the further identified 
sequence values is indicative of a sequence annotation and 

20 a degree of match between one of the biological sequences 
of the second library and at least one of the reference 
sequences; 

(e) processing each said further identified sequence 
value to generate further final data values indicative of a 

25 number of times each further identified sequence value is 
present in the second library; and 

(f) processing the final data values from the first 
specimen and the further identified sequence values from 
the second specimen to generate ratios of transcript 

30 sequences, each of said ratio values indicative of 

differences in numbers of gene transcripts between the two 
specimens. 

8. A method of quantifying relative abundance of mRNA 
in a biological specimen, said method comprising the steps 
35 of: 

(a) isolating a population of mRNA transcripts from 
the biological specimen; 
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(b) identifying genes from which the mRNA was 
transcribed by a sequence-specific method; 

(c) determining numbers, of mRNA transcripts 
corresponding to each of the genes; and 

5 (d) using the mRNA transcript numbers to determine 

the relative abundance of mRNA transcripts within the 
population of mRNA transcripts. 

9. A diagnostic method which comprises producing a 
gene transcript image, said method comprising the steps of: 
10 (a) isolating a population of mRNA transcripts from a 

biological specimen; 

identifying genes from- which the mRNA was 
" transcribed by a sequence-specific method; 

(c) determining numbers of mRNA transcripts 
15 corresponding to each of the genes; and 

(d) using the mRNA transcript numbers to determine 
the relative abundance of mRNA transcripts within the 
population of mRNA transcripts, where data determining the 
relative abundance values of mRNA transcripts is the gene 

20 transcript image of the biological specimen. 

10. The method of claim 9, further comprising: 

(e) providing a set of standard normal and diseased 
gene transcript images; and 

(f) comparing the gene transcript image of the 

25 biological specimen with the gene transcript images of step 
(e) to identify at least one of the standard gene 
transcript images which most closely approximate the gene 
transcript image of the biological specimen. 
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11. The method of claim 9, wherein the biological 
specimen is biopsy tissue, sputum, blood or urine. 

12. A method of producing a gene transcript image, 
said method comprising the steps of 

(a) obtaining a mixture of mRNA; 

(b) making cDNA copies of the mRNA; 
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(c) inserting .the cDNA. into a suitable vector and 
using said vector to transfect suitable host strain cells 
which are plated out and permitted to grow into clones, 
each clone, representing a unique mRNA; — ' • ' < 
5 .(d) .isolating a representative population of 

recombinant clones; 

(e) identifying amplified cDNAs from each clone in 
the population by a sequence-specific method which 
identifies gene .from which the unique mRNA was transcribed; 
10 ( f ) determining a number of times each gene is 

represented within the population of clones as an 
indication of relative : abundance; and 

(g) listing the' genes and their relative abundance in 
order of abundance, thereby producing the gene transcript 
15 image. 

13. The method of claim 12, also including the step 
of diagnosing disease by: 

repeating steps (a) through (g) on biological 
specimens from random sample- of normal and diseased humans, 
encompassing a variety of diseases, to produce reference 
sets of normal and diseased gene transcript images; 

obtaining a test specimen from a human, and producing 
a test gene transcript image by performing steps (a) 
through (g) on said test specimen; 
25 comparing the test gene transcript image with the 

reference sets of gene transcript images; and 

identifying at least one of the reference gene 
transcript images which most closely approximates the test 
gene transcript image. . 

30 14. A computer system for analyzing a library of 

biological sequences, said system including: 

means for receiving a set of transcript sequences, 
where each of the transcript sequences is indicative of a 
different one of the biological sequences of the library; 

35 and 

means for processing the transcript sequences in the 
computer system in which a database of reference transcript 
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sequences Indicative of reference biological sequences is . 
stored, wherein the computer is programmed with software ■ 
...for .generating, an identified sequencer/value for each of the 
i •..; transcri P t sequences, where' each 'said identified sequence'' 
5 value is indicative of a sequence annotation and a degree 
of match between a different one of the biological 
... sequences of .the library and. at .least W of the • reference 
transcript sequences, and for processing each said 
identified sequence value -to generate final data values 
indicative of a number of times each identified sequence 
value is present in the library. 
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15. The system of claim 14, also including: 
library generation means for producing the library of 

biological sequences and generating said set of transcript 
15 sequences from said library. 

16. The system of claim 15, wherein the library 
generation means includes: 

means for obtaining a mixture of mRNA; 
means for making cDNA copies of the mRNA; 
means for inserting the cDNA copies into cells and 
permitting the cells to grow into clones; 

means for isolating a representative population of the' 
clones and producing therefrom the library of biological 
sequences. 
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